Models model neural network training announce arxiv

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

arXiv cs.LGby Ivan PasichnykApril 1, 20261 min read0 views

arXiv:2603.28921v1 Announce Type: new Abstract: Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was trained wit

View PDF HTML (experimental)

Abstract:Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was trained with SGD or Adam (100% overlap). Surgical correction of only these layers fixes 62 misclassifications while retraining only 18% of parameters. A hybrid schedule -- physics momentum for fast early convergence, then constant momentum for the final refinement -- reaches 95% accuracy fastest among five methods tested. The main contribution is not an accuracy improvement but a principled, parameter-free tool for localizing and correcting specific failure modes in trained networks.

Comments: 18 pages, 3 figures, 5 tables. Code available on Kaggle

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.28921 [cs.LG]

(or arXiv:2603.28921v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28921

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ivan Pasichnyk [view email] [v1] Mon, 30 Mar 2026 18:53:03 UTC (44 KB)

Original source

arXiv cs.LG

https://arxiv.org/abs/2603.28921

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelneural networktraining

ModelsFresh

Meta paused its work with AI training startup Mercor after a data breach - Business Insider

Meta paused its work with AI training startup Mercor after a data breach Business Insider

GNews AI Meta

1mabout 5 hours ago

Research Papers

Multi-fidelity approaches for general constrained Bayesian optimization with application to aircraft design

Aircraft design relies heavily on solving challenging and computationally expensive Multidisciplinary Design Optimization problems. In this context, there has been growing interest in multi-fidelity models for Bayesian optimization to improve the MDO process by balancing computational cost and accuracy through the combination of high- and low-fidelity simulation models, enabling efficient exploration of the design process at a minimal computational effort. In the existing literature, fidelity selection focuses only on the objective function to decide how to integrate multiple fidelity levels, — Oihan Cordelier, Youssef Diouane, Nathalie Bartoli

arXiv

2m5 days ago

Research Papers

Transfer Learning in Bayesian Optimization for Aircraft Design

The use of transfer learning within Bayesian optimization addresses the disadvantages of the so-called \textit{cold start} problem by using source data to aid in the optimization of a target problem. We present a method that leverages an ensemble of surrogate models using transfer learning and integrates it in a constrained Bayesian optimization framework. We identify challenges particular to aircraft design optimization related to heterogeneous design variables and constraints. We propose the use of a partial-least-squares dimension reduction algorithm to address design space heterogeneity, a — Ali Tfaily, Youssef Diouane, Nathalie Bartoli

arXiv

1m5 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 324 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Meta paused its work with AI training startup Mercor after a data breach - Business Insider

Meta paused its work with AI training startup Mercor after a data breach Business Insider

GNews AI Meta

1mabout 5 hours ago

ModelsLive

b8661

llama: add custom newline split for Gemma 4 ( #21406 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

llama.cpp Releases

1mabout 1 hour ago

Models

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model WSJ

GNews AI Llama

1m11 months ago

Models

Meta’s AI Gamble Pays Off: 24% Ad Revenue Surge Validates ‘Andromeda’ and Llama 4 Integration - The Chronicle-Journal

Meta’s AI Gamble Pays Off: 24% Ad Revenue Surge Validates ‘Andromeda’ and Llama 4 Integration The Chronicle-Journal

GNews AI Llama

1mabout 2 months ago