Analyst News neural network announce analysis study arxiv

Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks

arXiv stat.MLby Dario Bocchi, Theotime Regimbeau, Carlo Lucibello, Luca Saglietti, Chiara CammarotaApril 6, 20261 min read0 views

Source Quiz

arXiv:2604.03068v1 Announce Type: cross Abstract: We analyze the one-pass stochastic gradient descent dynamics of a two-layer neural network with quadratic activations in a teacher--student framework. In the high-dimensional regime, where the input dimension $N$ and the number of samples $M$ diverge at fixed ratio $\alpha = M/N$, and for finite hidden widths $(p,p^*)$ of the student and teacher, respectively, we study the low-dimensional ordinary differential equations that govern the evolution of the student--teacher and student--student overlap matrices. We show that overparameterization ($p>p^*$) only modestly accelerates escape from a plateau of poor generalization by modifying the prefactor of the exponential decay of the loss. We then examine how unconstrained weight norms introduce

View PDF HTML (experimental)

Abstract:We analyze the one-pass stochastic gradient descent dynamics of a two-layer neural network with quadratic activations in a teacher--student framework. In the high-dimensional regime, where the input dimension $N$ and the number of samples $M$ diverge at fixed ratio $\alpha = M/N$, and for finite hidden widths $(p,p^)$ of the student and teacher, respectively, we study the low-dimensional ordinary differential equations that govern the evolution of the student--teacher and student--student overlap matrices. We show that overparameterization ($p>p^$) only modestly accelerates escape from a plateau of poor generalization by modifying the prefactor of the exponential decay of the loss. We then examine how unconstrained weight norms introduce a continuous rotational symmetry that results in a nontrivial manifold of zero-loss solutions for $p>1$. From this manifold the dynamics consistently selects the closest solution to the random initialization, as enforced by a conserved quantity in the ODEs governing the evolution of the overlaps. Finally, a Hessian analysis of the population-loss landscape confirms that the plateau and the solution manifold correspond to saddles with at least one negative eigenvalue and to marginal minima in the population-loss geometry, respectively.

Comments: 30 pages, 6 figures

Subjects:

Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)

Cite as: arXiv:2604.03068 [cond-mat.dis-nn]

(or arXiv:2604.03068v1 [cond-mat.dis-nn] for this version)

https://doi.org/10.48550/arXiv.2604.03068

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Dario Bocchi [view email] [v1] Fri, 3 Apr 2026 14:47:24 UTC (1,879 KB)

Original source

arXiv stat.ML

https://arxiv.org/abs/2604.03068

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

neural networkannounceanalysis

ModelsFresh

Word2Vec Explained: The Moment Words Became Relations

How models first learned meaning from context — and why that changed everything In the first post, we built the base layer: Text → Tokens → Numbers → (lots of math) → Tokens → Text In the second post, we stayed with the deeper question: Once words become numbers, how does meaning not disappear? We saw that the answer is not “because numbers are magical.” The answer is this: the numbers are learned in a space that preserves relationships. That was the real story of embeddings. Now we are ready for the next step. Because once you accept that words can become numbers without losing meaning, the next question becomes unavoidable: How are those numbers actually learned? This is where Word2Vec enters the story. And Word2Vec matters for more than historical reasons. It was not just a clever neura

Towards AI

16mabout 4 hours ago

ReleasesFresh

Simple parallel estimation of the partition ratio for Gibbs distributions

arXiv:2505.18324v2 Announce Type: replace-cross Abstract: We consider the problem of estimating the partition function $Z(\beta)=\sum_x \exp(\beta(H(x))$ of a Gibbs distribution with the Hamiltonian $H:\Omega\rightarrow\{0\}\cup[1,n]$. As shown in [Harris & Kolmogorov 2024], the log-ratio $q=\ln (Z(\beta_{\max})/Z(\beta_{\min}))$ can be estimated with accuracy $\epsilon$ using $O(\frac{q \log n}{\epsilon^2})$ calls to an oracle that produces a sample from the Gibbs distribution for parameter $\beta\in[\beta_{\min},\beta_{\max}]$. That algorithm is inherently sequential, or {\em adaptive}: the queried values of $\beta$ depend on previous samples. Recently, [Liu, Yin & Zhang 2024] developed a non-adaptive version that needs $O( q (\log^2 n) (\log q + \log \log n + \epsilon^{-2}) )$ samples.

arXiv cs.DS

2mabout 4 hours ago

Research PapersFresh

Online Graph Coloring for $k$-Colorable Graphs

arXiv:2511.16100v2 Announce Type: replace Abstract: We study the problem of online graph coloring for $k$-colorable graphs. The best previously known deterministic algorithm uses $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors for general $k$ and $\widetilde{O}(n^{5/6})$ colors for $k = 4$, both given by Kierstead in 1998. In this paper, we finally break this barrier, achieving the first major improvement in nearly three decades. Our results are summarized as follows: (1) $k \geq 5$ case. We provide a deterministic online algorithm to color $k$-colorable graphs with $\widetilde{O}(n^{1-\frac{1}{k(k-1)/2}})$ colors, significantly improving the current upper bound of $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors. Our algorithm also matches the best-known bound for $k = 4$ ($\widetilde{O}(n^{5/6})$ c

arXiv cs.DS

2mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 311 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Analyst News

Analyst NewsFresh

Hallucination is not a Bug. It is a Theorem. Here is the 5th-Grade Math That Proves It.

I will show you — with actual numbers, actual matrices, computed by hand — why every AI must hallucinate. Not sometimes. Always. The proof… Continue reading on Towards AI »

Towards AI

1mabout 4 hours ago

Analyst NewsLive

AirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costs

Southeast Asia’s largest low-cost carrier AirAsia X said on Monday it was raising ticket prices by as much as 40 per cent and cutting routes to cushion the impact of the war on Iran, but stressed demand for flights remained high. The Malaysia-based no-frills airline said about 10 per cent of its overall flights had been cut so far. It has raised fuel surcharges by about 20 per cent, while fare prices have increased between 31 per cent and 40 per cent. Average jet fuel costs have soared to about...

SCMP Tech (Asia AI)

1mabout 1 hour ago

Analyst NewsFresh

Engineering Algorithms for Dynamic Greedy Set Cover

arXiv:2604.03152v1 Announce Type: new Abstract: In the dynamic set cover problem, the input is a dynamic universe of elements and a fixed collection of sets. As elements are inserted or deleted, the goal is to efficiently maintain an approximate minimum set cover. While the past decade has seen significant theoretical breakthroughs for this problem, a notable gap remains between theoretical design and practical performance, as no comprehensive experimental study currently exists to validate these results. In this paper, we bridge this gap by implementing and evaluating four greedy-based dynamic algorithms across a diverse range of real-world instances. We derive our implementations from state-of-the-art frameworks (such as GKKP, STOC 2017; SU, STOC 2023; SUZ, FOCS 2024), which we simplify

arXiv cs.DS

2mabout 4 hours ago

Analyst NewsFresh

Online Drone Coverage of Targets on a Line

arXiv:2604.02491v1 Announce Type: new Abstract: We study a problem of online targets coverage by a drone or a sensor that is equipped with a camera or an antenna of fixed half-angle of view $\alpha$. The targets to be monitored appear at arbitrary positions on a line barrier in an online manner. When a new target appears, the drone has to move to a location that covers the newly arrived target, as well as already existing targets. The objective is to design a coverage algorithm that optimizes the total length of the drone's trajectory. Our results are reported in terms of an algorithm's competitive ratio, i.e., the worst-case ratio (over all inputs) of its cost to that of an optimal offline algorithm. In terms of upper bounds, we present three online algorithms and prove bounds on their co

arXiv cs.DS

2mabout 4 hours ago