Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic releases part of AI tool source code in 'error'TechXplore AIMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: Claude38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI SafetyBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic releases part of AI tool source code in 'error'TechXplore AIMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: Claude38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI Safety

Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.24647v2 Announce Type: replace Abstract: The autoresearch repository enables an LLM agent to search for optimal hyperparameter configurations on an unconstrained search space by editing the training code directly. Given a fixed compute budget and constraints, we use autoresearch as a testbed to compare classical hyperparameter optimization (HPO) algorithms against LLM-based methods on tuning the hyperparameters of a small language model. Within a fixed hyperparameter search space, classical HPO methods such as CMA-ES and TPE consistently outperform LLM-based agents. However, an LLM — Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, Arber Zela

View PDF HTML (experimental)

Abstract:The autoresearch repository enables an LLM agent to search for optimal hyperparameter configurations on an unconstrained search space by editing the training code directly. Given a fixed compute budget and constraints, we use autoresearch as a testbed to compare classical hyperparameter optimization (HPO) algorithms against LLM-based methods on tuning the hyperparameters of a small language model. Within a fixed hyperparameter search space, classical HPO methods such as CMA-ES and TPE consistently outperform LLM-based agents. However, an LLM agent that directly edits training source code in an unconstrained search space narrows the gap to classical methods substantially despite using only a self-hosted open-weight 27B model. Methods that avoid out-of-memory failures outperform those with higher search diversity, suggesting that reliability matters more than exploration breadth. While small and mid-sized LLMs struggle to track optimization state across trials, classical methods lack domain knowledge. To bridge this gap, we introduce Centaur, a hybrid that shares CMA-ES's internal state, including mean vector, step-size, and covariance matrix, with an LLM. Centaur achieves the best result in our experiments, with its 0.8B variant outperforming the 27B variant, suggesting that a cheap LLM suffices when paired with a strong classical optimizer. The 0.8B model is insufficient for unconstrained code editing but sufficient for hybrid optimization, while scaling to 27B provides no advantage for fixed search space methods. Preliminary experiments with the frontier model Gemini 3.1 Pro Preview do not close the gap to classical methods. Code is available at this https URL.

Subjects:

Machine Learning (cs.LG); Machine Learning (stat.ML)

Cite as: arXiv:2603.24647 [cs.LG]

(or arXiv:2603.24647v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.24647

arXiv-issued DOI via DataCite

Submission history

From: Fabio Ferreira [view email] [v1] Wed, 25 Mar 2026 17:29:40 UTC (1,874 KB) [v2] Sun, 29 Mar 2026 18:46:53 UTC (2,456 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Can LLMs Be…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 182 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers