Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessNvidia’s AI Powerhouse Rally Ignites Fresh Wall Street Hype - TipRanksGNews AI NVIDIAThe Real Reason OpenAI Shut Sora Down Is a Warning to Every AI Startup - FuturismGoogle News: OpenAIChinese firms market Iran war intelligence ‘exposing’ U.S. forces - The Washington PostGNews AI military[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)Reddit r/MachineLearningtrunk/8c8414e5c03f21b5405acc2fd9115f4448dcd08a: revert https://github.com/pytorch/pytorch/pull/172340 (#179151)PyTorch ReleasesWhite Lake group to host April 14 program on how artificial intelligence works - Shoreline Media GroupGoogle News: AINvidia’s $2 billion Marvell bet is not an investment. It is a toll booth.The Next Web NeuralNvidia’s $2 billion Marvell bet is not an investment. It is a toll booth. - The Next WebGNews AI NVIDIAAI Agents Increase Developer Preparatory Workload - Let's Data ScienceGNews AI IBMNetflix, Meta, IBM speakers discuss AI and their workdays - theregister.comGNews AI IBM[D]Is AI cost tracking/attribution a real problem or just something you deal with later?Reddit r/MachineLearningAnthropic Spots 'Emotion Vectors' Inside Claude That Influence AI BehaviorDecrypt AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessNvidia’s AI Powerhouse Rally Ignites Fresh Wall Street Hype - TipRanksGNews AI NVIDIAThe Real Reason OpenAI Shut Sora Down Is a Warning to Every AI Startup - FuturismGoogle News: OpenAIChinese firms market Iran war intelligence ‘exposing’ U.S. forces - The Washington PostGNews AI military[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)Reddit r/MachineLearningtrunk/8c8414e5c03f21b5405acc2fd9115f4448dcd08a: revert https://github.com/pytorch/pytorch/pull/172340 (#179151)PyTorch ReleasesWhite Lake group to host April 14 program on how artificial intelligence works - Shoreline Media GroupGoogle News: AINvidia’s $2 billion Marvell bet is not an investment. It is a toll booth.The Next Web NeuralNvidia’s $2 billion Marvell bet is not an investment. It is a toll booth. - The Next WebGNews AI NVIDIAAI Agents Increase Developer Preparatory Workload - Let's Data ScienceGNews AI IBMNetflix, Meta, IBM speakers discuss AI and their workdays - theregister.comGNews AI IBM[D]Is AI cost tracking/attribution a real problem or just something you deal with later?Reddit r/MachineLearningAnthropic Spots 'Emotion Vectors' Inside Claude That Influence AI BehaviorDecrypt AI
AI NEWS HUBbyEIGENVECTOREigenvector

SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.20253v1 Announce Type: cross Abstract: Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,916 single-round (initial guess) and — Yadi Cao, Sicheng Lai, Jiahe Huang, Yang Zhang, Zach Lawrence, Rohan Bhakta, Izzy F. Thomas, Mingyun Cao, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu

Authors:Yadi Cao, Sicheng Lai, Jiahe Huang, Yang Zhang, Zach Lawrence, Rohan Bhakta, Izzy F. Thomas, Mingyun Cao, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu

View PDF HTML (experimental)

Abstract:Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,916 single-round (initial guess) and 1,900 multi-round (adjustment by trial-and-error) tasks across 12 simulators from fluid dynamics, solid mechanics, and plasma physics. Each simulator's cost is analytically defined and platform-independent. Frontier LLMs achieve 46--64% success rates in single-round mode, dropping to 35--54% under high accuracy requirements, rendering their initial guesses unreliable especially for high accuracy tasks. Multi-round mode improves rates to 71--80%, but LLMs are 1.5--2.5x slower than traditional scanning, making them uneconomical choices. We also investigate parameter group correlations for knowledge transfer potential, and the impact of in-context examples and reasoning effort, providing practical implications for deployment and fine-tuning. We open-source SimulCost as a static benchmark and extensible toolkit to facilitate research on improving cost-aware agentic designs for physics simulations, and for expanding new simulation environments. Code and data are available at this https URL.

Subjects:

Computational Physics (physics.comp-ph); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Cite as: arXiv:2603.20253 [physics.comp-ph]

(or arXiv:2603.20253v1 [physics.comp-ph] for this version)

https://doi.org/10.48550/arXiv.2603.20253

arXiv-issued DOI via DataCite

Submission history

From: Yadi Cao [view email] [v1] Wed, 11 Mar 2026 05:00:48 UTC (4,952 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
SimulCost: …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 151 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers