AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scientific Assistance
arXiv:2511.14043v3 Announce Type: replace Abstract: AI Scientific Assistant Core (AISAC) is a transparent, modular multi-agent runtime developed at Argonne National Laboratory to support long-horizon, evidence-grounded scientific reasoning. Rather than proposing new agent algorithms or claiming autonomous scientific discovery, AISAC contributes a governed execution substrate that operationalizes key requirements for deploying agentic AI in scientific practice, including explicit role semantics, budgeted context management, traceable execution, and reproducible interaction with tools and knowle — Chandrachur Bhattacharya, Sibendu Som
View PDF HTML (experimental)
Abstract:AI Scientific Assistant Core (AISAC) is a transparent, modular multi-agent runtime developed at Argonne National Laboratory to support long-horizon, evidence-grounded scientific reasoning. Rather than proposing new agent algorithms or claiming autonomous scientific discovery, AISAC contributes a governed execution substrate that operationalizes key requirements for deploying agentic AI in scientific practice, including explicit role semantics, budgeted context management, traceable execution, and reproducible interaction with tools and knowledge. AISAC enforces four structural guarantees for scientific reasoning: (1) declarative agent registration with runtime-enforced role semantics and automatic system prompt generation; (2) budgeted orchestration via explicit per-turn context and delegation depth limits; (3) role-aligned memory access across episodic, dialogue, and evidence layers; and (4) trace-driven transparency through persistent execution records and a live event-stream interface. These guarantees are implemented through hybrid persistent memory (SQLite and dual FAISS indices), governed retrieval with agent-scoped RAG, structured tool execution with schema validation, and a configuration-driven bootstrap mechanism that enables project specific extension without modifying the shared core. AISAC is currently deployed across multiple scientific workflows at Argonne, including combustion science, materials research, and energy process safety, demonstrating its use as a reusable substrate for domain-specialized AI scientific assistants.
Subjects:
Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA)
Cite as: arXiv:2511.14043 [cs.AI]
(or arXiv:2511.14043v3 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2511.14043
arXiv-issued DOI via DataCite
Submission history
From: Chandrachur Bhattacharya [view email] [v1] Tue, 18 Nov 2025 01:51:05 UTC (120 KB) [v2] Fri, 6 Feb 2026 19:54:16 UTC (342 KB) [v3] Fri, 27 Mar 2026 21:12:29 UTC (348 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling
arXiv:2604.00510v1 Announce Type: new Abstract: Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reaso

Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models
arXiv:2604.00445v1 Announce Type: new Abstract: Uncertainty estimation (UE) aims to detect hallucinated outputs of large language models (LLMs) to improve their reliability. However, UE metrics often exhibit unstable performance across configurations, which significantly limits their applicability. In this work, we formalise this phenomenon as proxy failure, since most UE metrics originate from model behaviour, rather than being explicitly grounded in the factual correctness of LLM outputs. With this, we show that UE metrics become non-discriminative precisely in low-information regimes. To alleviate this, we propose Truth AnChoring (TAC), a post-hoc calibration method to remedy UE metrics, by mapping the raw scores to truth-aligned scores. Even with noisy and few-shot supervision, our TAC

Robust Geospatial Coordination of Multi-Agent Communications Networks Under Attrition
arXiv:2512.02079v2 Announce Type: replace-cross Abstract: Coordinating emergency responses in extreme environments, such as wildfires, requires resilient and high-bandwidth communication backbones. While autonomous aerial swarms can establish ad-hoc networks to provide this connectivity, the high risk of individual node attrition in these settings often leads to network fragmentation and mission-critical downtime. To overcome this challenge, we introduce and formalize the problem of Robust Task Networking Under Attrition (RTNUA), which extends connectivity maintenance in multi-robot systems to explicitly address proactive redundancy and attrition recovery. We then introduce Physics-Informed Robust Employment of Multi-Agent Networks ($\Phi$IREMAN), a topological algorithm leveraging physics
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors
arXiv:2507.21989v3 Announce Type: replace-cross Abstract: Advances in embedding models for text, image, audio, and video drive progress across multiple domains, including retrieval-augmented generation, recommendation systems, and others. Many of these applications require an efficient method to retrieve items that are close to a given query in the embedding space while satisfying a filter condition based on the item's attributes, a problem known as filtered approximate nearest neighbor search (FANNS). By performing an in-depth literature analysis on FANNS, we identify a key gap in the research landscape: publicly available datasets with embedding vectors from state-of-the-art transformer-based text embedding models that contain abundant real-world attributes covering a broad spectrum of a

Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
arXiv:2604.00022v1 Announce Type: new Abstract: Multi-dimensional rubric-based dialogue evaluation is widely used to assess conversational AI, yet its criterion validity -- whether quality scores are associated with the downstream outcomes they are meant to serve -- remains largely untested. We address this gap through a two-phase study on a major Chinese matchmaking platform, testing a 7-dimension evaluation rubric (implemented via LLM-as-Judge) against verified business conversion. Our findings concern rubric design and weighting, not LLM scoring accuracy: any judge using the same rubric would face the same structural issue. The core finding is dimension-level heterogeneity: in Phase 2 (n=60 human conversations, stratified sample, verified labels), Need Elicitation (D1: rho=0.368, p=0.00
Lead Zirconate Titanate Reservoir Computing for Classification of Written and Spoken Digits
arXiv:2604.00207v1 Announce Type: new Abstract: In this paper we extend our earlier work of (Rietman et al. 2022) presenting an application of physical Reservoir Computing (RC) to the classification of handwritten and spoken digits. We utilize an unpoled cube of Lead Zirconate Titanate (PZT) as a computational substrate to process these datasets. Our results demonstrate that the PZT reservoir achieves 89.0% accuracy on MNIST handwritten digits, representing a 2.4 percentage point improvement over logistic regression baselines applied to the same preprocessed data. However, for the AudioMNIST spoken digits dataset, the reservoir system (88.2% accuracy) performs equivalently to baseline methods (88.1% accuracy), suggesting that reservoir computing provides the greatest benefits for classific
Dynamic Graph Neural Network with Adaptive Features Selection for RGB-D Based Indoor Scene Recognition
arXiv:2604.00372v1 Announce Type: new Abstract: Multi-modality of color and depth, i.e., RGB-D, is of great importance in recent research of indoor scene recognition. In this kind of data representation, depth map is able to describe the 3D structure of scenes and geometric relations among objects. Previous works showed that local features of both modalities are vital for promotion of recognition accuracy. However, the problem of adaptive selection and effective exploitation on these key local features remains open in this field. In this paper, a dynamic graph model is proposed with adaptive node selection mechanism to solve the above problem. In this model, a dynamic graph is built up to model the relations among objects and scene, and a method of adaptive node selection is proposed to ta

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!