Scaling Laws for Neural Language Models: New Evidence Challenges Chinchilla Predictions
New empirical research from Epoch AI challenges the Chinchilla scaling laws, suggesting that compute-optimal training requires significantly more tokens than previously believed, with implications for how frontier models should be trained.
Researchers at Epoch AI have published new empirical evidence that challenges the widely-adopted Chinchilla scaling laws, which have guided the training of most frontier language models since 2022. The new research suggests that compute-optimal training requires substantially more training tokens than the Chinchilla predictions indicate, particularly at large compute budgets.
The Chinchilla paper, published by DeepMind in 2022, established that optimal model training requires approximately 20 training tokens per model parameter. This finding led to a shift in the field toward training smaller models on more data, with models like Llama 2 and Mistral following this prescription.
The new Epoch AI research, based on a comprehensive analysis of training runs across multiple organizations, finds that the optimal token-to-parameter ratio increases significantly at larger compute scales. At the compute budgets now used for frontier models, the optimal ratio may be closer to 50-100 tokens per parameter.
If confirmed, these findings would suggest that current frontier models are significantly undertrained relative to their optimal configuration. The implications are significant: achieving optimal performance from a given compute budget may require training for longer on more data rather than scaling model size.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
Scaling LawsChinchillaResearchPerplexity AI Launches Deep Research Feature Competing Directly with OpenAI
Perplexity's Deep Research conducts multi-step web searches, synthesizes information from dozens of sources, and produces comprehensive research reports in minutes, challenging OpenAI's o3-powered research assistant.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Semi-Automated Knowledge Engineering and Process Mapping for Total Airport Management
arXiv:2603.26076v1 Announce Type: new Abstract: Documentation of airport operations is inherently complex due to extensive technical terminology, rigorous regulations, proprietary regional information, and fragmented communication across multiple stakeholders. The resulting data silos and semantic inconsistencies present a significant impediment to the Total Airport Management (TAM) initiative. This paper presents a methodological framework for constructing a domain-grounded, machine-readable Knowledge Graph (KG) through a dual-stage fusion of symbolic Knowledge Engineering (KE) and generative — Darryl Teo, Adharsha Sam, Chuan Shen Marcus Koh, Rakesh Nagi, Nuno Antunes Ribeiro

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments
arXiv:2603.25747v1 Announce Type: new Abstract: The rapid evolution of Large Multimodal Models (LMMs) has enabled agents to perform complex digital and physical tasks, yet their deployment as autonomous decision-makers introduces substantial unintentional behavioral safety risks. However, the absence of a comprehensive safety benchmark remains a major bottleneck, as existing evaluations rely on low-fidelity environments, simulated APIs, or narrowly scoped tasks. To address this gap, we present BeSafe-Bench (BSB), a benchmark for exposing behavioral safety risks of situated agents in functional — Yuxuan Li, Yi Lin, Peng Wang, Shiming Liu, Xuetao Wei

AutoB2G: A Large Language Model-Driven Agentic Framework For Automated Building-Grid Co-Simulation
arXiv:2603.26005v1 Announce Type: new Abstract: The growing availability of building operational data motivates the use of reinforcement learning (RL), which can learn control policies directly from data and cope with the complexity and uncertainty of large-scale building clusters. However, most existing simulation environments prioritize building-side performance metrics and lack systematic evaluation of grid-level impacts, while their experimental workflows still rely heavily on manual configuration and substantial programming expertise. Therefore, this paper proposes AutoB2G, an automated b — Borui Zhang, Nariman Mahdavi, Subbu Sethuvenkatraman, Shuang Ao, Flora Salim

GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation
arXiv:2603.26266v1 Announce Type: new Abstract: Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias - they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing via Instructional-Video Driven Expertise), a traini — Rui Xie, Zhi Gao, Chenrui Shi, Zirui Shang, Lu Chen, Qing Li
