MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration
arXiv:2603.28254v1 Announce Type: new Abstract: Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before finite-step Newton--Schulz using row/column squared — Da Chang, Qiankun Shi, Lvgang Zhang, Yu Li, Ruijie Zhang, Yao Lu, Yongxiang Liu, Ganzhao Yuan
View PDF HTML (experimental)
Abstract:Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before finite-step Newton--Schulz using row/column squared-norm statistics and only $\mathcal{O}(m+n)$ auxiliary state. We show that finite-step orthogonalization is governed by input spectral properties, especially stable rank and condition number, and that row/column normalization is a zeroth-order whitening surrogate that removes marginal scale mismatch. For the hidden matrix weights targeted by {\method}, the row-normalized variant R is the natural default and preserves the $\widetilde{\mathcal{O}}(T^{-1/4})$ stationarity guarantee of Muon-type methods. In LLaMA2 pretraining on C4, the default R variant consistently outperforms Muon on 130M and 350M models, yielding faster convergence and lower validation perplexity.
Subjects:
Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2603.28254 [cs.LG]
(or arXiv:2603.28254v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.28254
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Da Chang [view email] [v1] Mon, 30 Mar 2026 10:28:18 UTC (403 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivUCLA Researchers Highlight the ‘Body Gap’ in AI: Why Lacking Human Experience Could Impact Safety - Bioengineer.org
<a href="https://news.google.com/rss/articles/CBMiuwFBVV95cUxNekVOelBPcWt4Y2lGcHFvalZ4UWdRLUxLMnN5NndWVURBNUt2T25DYjVsT2F4VVdUUE1XbEFsTW91d3JDaFJsV1NiSjBDenU2Z1ZCaUhrUl81ME5TTlN4OC1lSzhmV1hrUThSc3U5UkRyelRGYWVDdE9fUF84VUZSWEpFYlNBTnU0WHBMVTVHUlFqd1VXZ1hna3NNSjNfeXdTdkdJTHdJQzhvdE52eVZLVkV6R0hETW84a2pR?oc=5" target="_blank">UCLA Researchers Highlight the ‘Body Gap’ in AI: Why Lacking Human Experience Could Impact Safety</a> <font color="#6f6f6f">Bioengineer.org</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxNd09pNzVMTG0wTVNDZGgyWlZKNFpURjg4YkZTYVpKNWd2WnUzSUllZXdsYVgxM0tRblhmbXNDQ050cGlmY0gwVVpxVXZxLW5XX1V5U1NUXy1xaXVoVmlLb29RSG1aa0hNSzBaZjh3Q1N0a1pIQnNpc3lKQnJDdG5rUjlZUjE0NUhtUWstUGwxSHdtSWUzelUydXFQbzdZaTB1QnNjYWF2WWQ5RnB3YV9vNk00VkhyQUdKSnFzX1VoZWFzZElkLVh6a2QzMm1pY21EeURBVFhvMUZMNDFZTFpmd0k2OWJ1MFpYd0wydi1BSUFyalJhUGRfeWFHY21UZzVGYm5USU9iV3dCQjdHR2hUVGw1UUk5aU5xVkExX1RBckhQYk1OcTQwRDJsNldSRWdIZ2ljdlg0SVdYQWRkQkx2eG1feWtfdXM2YWFNdEpuLXZEcGVqSVRzNFdid0dwd2QwQUFQWEItUVp2VktzeG1LM09KOHM3bEltekZyRjNSbDhiZnRleGVnU0VBSTI0NDhMZEx3RThvSFFKVFRFMms0cEtVeXpuQ3Z2OWQyeGstaDJOLWJmNFpzWA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Save the Sun Shrimp!
The supposition that we live in a "goldilocks zone" is frankly just nonsense built up by an anthropocentric need to feel self-important, like Copernicus I am here to rescue us from a self-absorbed disaster of thought. Indeed, what is required for life to form is the ability to create complex structures with causal persistence times above a threshold. With this in mind we are able to find many areas where organisms could persist, if we just had the eyes to see them, namely the Sun! The surface of the Sun is frankly massive, mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space:
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno
<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> <font color="#6f6f6f">University of Nevada, Reno</font>
AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> <font color="#6f6f6f">Forbes</font>
Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st
<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> <font color="#6f6f6f">simplywall.st</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!