Quality-Controlled Active Learning via Gaussian Processes for Robust Structure-Property Learning in Autonomous Microscopy
arXiv:2603.29135v1 Announce Type: new Abstract: Autonomous experimental systems are increasingly used in materials research to accelerate scientific discovery, but their performance is often limited by low-quality, noisy data. This issue is especially problematic in data-intensive structure-property learning tasks such as Image-to-Spectrum (Im2Spec) and Spectrum-to-Image (Spec2Im) translations, where standard active learning strategies can mistakenly prioritize poor-quality measurements. We introduce a gated active learning framework that combines curiosity-driven sampling with a physics-informed quality control filter based on the Simple Harmonic Oscillator model fits, allowing the system to automatically exclude low-fidelity data during acquisition. Evaluations on a pre-acquired dataset
View PDF HTML (experimental)
Abstract:Autonomous experimental systems are increasingly used in materials research to accelerate scientific discovery, but their performance is often limited by low-quality, noisy data. This issue is especially problematic in data-intensive structure-property learning tasks such as Image-to-Spectrum (Im2Spec) and Spectrum-to-Image (Spec2Im) translations, where standard active learning strategies can mistakenly prioritize poor-quality measurements. We introduce a gated active learning framework that combines curiosity-driven sampling with a physics-informed quality control filter based on the Simple Harmonic Oscillator model fits, allowing the system to automatically exclude low-fidelity data during acquisition. Evaluations on a pre-acquired dataset of band-excitation piezoresponse spectroscopy (BEPS) data from PbTiO3 thin films with spatially localized noise show that the proposed method outperforms random sampling, standard active learning, and multitask learning strategies. The gated approach enhances both Im2Spec and Spec2Im by handling noise during training and acquisition, leading to more reliable forward and inverse predictions. In contrast, standard active learners often misinterpret noise as uncertainty and end up acquiring bad samples that hurt performance. Given its promising applicability, we further deployed the framework in real-time experiments on BiFeO3 thin films, demonstrating its effectiveness in real autonomous microscopy experiments. Overall, this work supports a shift toward hybrid autonomy in self-driving labs, where physics-informed quality assessment and active decision-making work hand-in-hand for more reliable discovery.
Comments: 22 pages, 12 figures, 2 tables; submitted to npj Computational Materials
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2603.29135 [cs.LG]
(or arXiv:2603.29135v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.29135
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Md Hasan Jawad Chowdhury [view email] [v1] Tue, 31 Mar 2026 01:35:12 UTC (15,823 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingannouncePart 16: Data Manipulation in Data Validation and Quality Control
How Data Contracts Prevent Silent Degradation in Production Systems Data quality issues are the silent killers of production systems. A single malformed record can crash your pipeline. A gradual drift in data distributions can slowly degrade model performance. Missing values that sneak through validation can corrupt downstream analytics. The cost of poor data quality is measured not just in failed jobs, but in wrong business decisions, customer frustration, and lost revenue. Data validation and cleaning are not optional preprocessing steps. They are your first line of defense against data degradation. This article explores practical techniques for ensuring data quality through validation rules, type enforcement, and systematic cleaning operations. We will look at how to catch issues early,
From Interface to Behavior: The New UX Engineering
Agentic UX is the next step in the evolution of interfaces. Services are learning to listen to the user, understand intent, and act on their own — moving beyond familiar buttons and forms. This article explores what agentic interaction is, what skills designers now need, how to design system behavior, what mistakes to avoid, and how to integrate the AX approach into your workflow. Traditionally, a UX designer was responsible for the visual mechanics of interaction: where to place a button, how a user fills out a form, and in what order screens appear. The main goal was to make the path clear and manageable, so the user would not get lost, feel overloaded, or be left wondering what to do next. Designers built the rhythm of the interface: what appears on screen, when, and with what emphasis.
A Plateau Plan to Become AI-Native
AI will not transform because it’s deployed – it will transform because the way of operating is redesigned. The tricky part? Transformations rarely fail at the start, they fail in the middle – when organisations try to scale. In a previous article I defined the concept of the AI-native bank. A bank where decisions, processes and customer interactions are continuously driven by AI. Since publishing that article, one question came up repeatedly: “ How do we actually get there? ” Before exploring that question, it is important to acknowledge something. The idea of AI-native organisations is still largely a promise. The potential of AI is enormous, but the long-term economics and risk profile of AI-driven companies are still emerging. Some initiatives will deliver extraordinary value. Others w
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Private AI: Enterprise Data in the RAG Era
Introduction: The Modern Crisis — Data Sovereignty. In early to mid-2023, global technology enterprises became acutely aware of a significant threat to their privacy and data security. The source of this issue was the employees themselves; whether intentionally or accidentally, staff shared critical and confidential proprietary information unauthorized for external access with public AI models. The core problem is that this data became part of global knowledge bases, which these companies do not control, making it accessible to the public. Consequently, a pressing need emerged for new measures to prevent data leakage. private AI Models Prominent Companies Affected by This Risk: Samsung: A group of engineers in the semiconductor division uploaded confidential source code to ChatGPT to fix p
Google AI educator training series expands digital skills push across K-12 and higher education - EdTech Innovation Hub
<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTFBQTVFQNE91MHp2bEF1QlE5QlNLQ0daRjFHZVdzT09iOUpxNUZHbDEtWW9ybHdaYmFSbmUzbk1ReHBDS2FSZkpnMXVkeGQ4SEVMOG5WbnNNRUtvYjdiVDdJY1FUZ2pVTC05QUYxRkQwWUh5M1Z4aEpJLUtmcw?oc=5" target="_blank">Google AI educator training series expands digital skills push across K-12 and higher education</a> <font color="#6f6f6f">EdTech Innovation Hub</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!