Research Papers research paper arxiv ai artificial-intelligence

SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

arXivMarch 31, 202610 min read0 views

arXiv:2603.20253v1 Announce Type: cross Abstract: Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,916 single-round (initial guess) and — Yadi Cao, Sicheng Lai, Jiahe Huang, Yang Zhang, Zach Lawrence, Rohan Bhakta, Izzy F. Thomas, Mingyun Cao, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu

Authors:Yadi Cao, Sicheng Lai, Jiahe Huang, Yang Zhang, Zach Lawrence, Rohan Bhakta, Izzy F. Thomas, Mingyun Cao, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu

View PDF HTML (experimental)

Abstract:Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,916 single-round (initial guess) and 1,900 multi-round (adjustment by trial-and-error) tasks across 12 simulators from fluid dynamics, solid mechanics, and plasma physics. Each simulator's cost is analytically defined and platform-independent. Frontier LLMs achieve 46--64% success rates in single-round mode, dropping to 35--54% under high accuracy requirements, rendering their initial guesses unreliable especially for high accuracy tasks. Multi-round mode improves rates to 71--80%, but LLMs are 1.5--2.5x slower than traditional scanning, making them uneconomical choices. We also investigate parameter group correlations for knowledge transfer potential, and the impact of in-context examples and reasoning effort, providing practical implications for deployment and fine-tuning. We open-source SimulCost as a static benchmark and extensible toolkit to facilitate research on improving cost-aware agentic designs for physics simulations, and for expanding new simulation environments. Code and data are available at this https URL.

Subjects:

Computational Physics (physics.comp-ph); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Cite as: arXiv:2603.20253 [physics.comp-ph]

(or arXiv:2603.20253v1 [physics.comp-ph] for this version)

https://doi.org/10.48550/arXiv.2603.20253

arXiv-issued DOI via DataCite

Submission history

From: Yadi Cao [view email] [v1] Wed, 11 Mar 2026 05:00:48 UTC (4,952 KB)

Original source

arXiv

https://arxiv.org/abs/2603.20253

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsLive

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

In a study, a new “dynamic wavelet matrix” was used as a vector database, where the memory grows only with log(σ) instead of with n. I considered building a KNN model with a huge memory, capable of holding, for example, 5 billion vectors. First, the words in the context window are converted into an embedding using deberta-v3-small. This is a fast encoder that also takes the position of the tokens into account (disentangled attention) and is responsible for the context in the model. The embedding is then converted into a bit sequence using binary quantization, where dimensions greater than 0 are converted to 1 and otherwise to 0. The advantage is that bit sequences are compressible and are entered into the dynamic wavelet matrix, where the memory grows only with log(σ). A response token is

discuss.huggingface.co

2mabout 2 hours ago

Research PapersFresh

[D] ICML reviewer making up false claim in acknowledgement, what to do?

In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperparameter settings. We did do a comprehensive list of hyperparameter comparisons and the reviewer's claim is not supported by what's presented in the paper. In this case what can we do? submitted by /u/dontknowwhattoplay [link] [comments]

Reddit r/MachineLearning

1mabout 3 hours ago

ModelsLive

Anthropic Spots 'Emotion Vectors' Inside Claude That Influence AI Behavior

Researchers say internal emotion-like signals shape how large language models make decisions.

Decrypt AI

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 151 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

Submission history

Daily AI Digest

More about

Scaling Agentic Memory to 5 Billion Vectors via Binary Quantization and Dynamic Wavelet Matrices

[D] ICML reviewer making up false claim in acknowledgement, what to do?

Anthropic Spots 'Emotion Vectors' Inside Claude That Influence AI Behavior

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

Milton Keynes University Hospital pioneers AI to combat clinician burnout - Oracle

[D] ICML reviewer making up false claim in acknowledgement, what to do?

College grads in ‘AI-proof’ careers like psychology and education are seeing negative returns on their degrees

Researchers 3D print robot the size of a single-cell organism — devices move and navigate even without a ‘brain,’ uses their shape and the environment to get going