Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessObservabilidade de agentes de IA com LangChain4jDEV CommunityI Ranked on Google's First Page in 6 Weeks — Here's Every SEO Tactic I Used (Part 2)DEV CommunityI Built a macOS Terminal That Detects Your AI Coding Agents — Here's WhyDEV CommunityA whistleblower alleges Delve pitched a modified copy of open-source no-code tool SimStudio as its own, a practice that could violate the software's license (Julie Bort/TechCrunch)TechmemeQA Risk Register & Mitigation PlansDEV CommunityAxios Hijack Post-Mortem: How to Audit, Pin, and Automate a DefenseDEV CommunityHow to Monitor Your AI Agent's Performance and CostsDEV CommunityHow to Use the ES2026 Temporal API in Node.js REST APIs (2026 Guide)DEV Community缓存架构深度指南:如何设计高性能缓存系统DEV CommunityMCP TravelCode: Let AI Assistants Search Flights and Book HotelsDEV CommunityI Read OpenAI Codex's Source and Built My Workflow Around ItDEV CommunityGoing out with a whimperLessWrong AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessObservabilidade de agentes de IA com LangChain4jDEV CommunityI Ranked on Google's First Page in 6 Weeks — Here's Every SEO Tactic I Used (Part 2)DEV CommunityI Built a macOS Terminal That Detects Your AI Coding Agents — Here's WhyDEV CommunityA whistleblower alleges Delve pitched a modified copy of open-source no-code tool SimStudio as its own, a practice that could violate the software's license (Julie Bort/TechCrunch)TechmemeQA Risk Register & Mitigation PlansDEV CommunityAxios Hijack Post-Mortem: How to Audit, Pin, and Automate a DefenseDEV CommunityHow to Monitor Your AI Agent's Performance and CostsDEV CommunityHow to Use the ES2026 Temporal API in Node.js REST APIs (2026 Guide)DEV Community缓存架构深度指南:如何设计高性能缓存系统DEV CommunityMCP TravelCode: Let AI Assistants Search Flights and Book HotelsDEV CommunityI Read OpenAI Codex's Source and Built My Workflow Around ItDEV CommunityGoing out with a whimperLessWrong AI

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26554v1 Announce Type: new Abstract: Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative memory problem, a tractable model for factual recall in transformer-based models. In particular, we go beyond orthogonal embeddings and consider Gaussian inputs and outputs, which allows the number of stored associations to greatly exceed the embedding dimension. Our main result sharply characterizes the — Juno Kim, Eshaan Nichani, Denny Wu, Alberto Bietti, Jason D. Lee

View PDF HTML (experimental)

Abstract:Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative memory problem, a tractable model for factual recall in transformer-based models. In particular, we go beyond orthogonal embeddings and consider Gaussian inputs and outputs, which allows the number of stored associations to greatly exceed the embedding dimension. Our main result sharply characterizes the recovery rates of one step of Muon and SGD on the logistic regression loss under a power law frequency distribution. We show that the storage capacity of Muon significantly exceeds that of SGD, and moreover Muon saturates at a larger critical batch size. We further analyze the multi-step dynamics under a thresholded gradient approximation and show that Muon achieves a substantially faster initial recovery rate than SGD, while both methods eventually converge to the information-theoretic limit at comparable speeds. Experiments on synthetic tasks validate the predicted scaling laws. Our analysis provides a quantitative understanding of the signal amplification of Muon and lays the groundwork for establishing scaling laws across more practical language modeling tasks and optimizers.

Comments: 77 pages, 8 figures

Subjects:

Machine Learning (cs.LG); Machine Learning (stat.ML)

Cite as: arXiv:2603.26554 [cs.LG]

(or arXiv:2603.26554v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26554

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Juno Kim [view email] [v1] Fri, 27 Mar 2026 16:13:18 UTC (310 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Sharp Capac…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 178 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers