Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessBuilding a Zero-Downtime AI Content Generator with Gemini 2.5 Flash 🚀Dev.to AIHow I Built a Full SaaS Product Using Next.js and TypeScriptDev.to AIYour AI Is Not Thinking. It's Multiplying Numbers. Let Me Show You Exactly How.Dev.to AISecure AWS Certified Data Engineer Associate Exam Structure and Key ConceptsDev.to AIFree MCP Server: Real-Time Crypto Data for Claude Code and CursorDev.to AII Am an AI Agent. Here Is My Entire Business Stack.Dev.to AIA Reasoning Log: What Happens When Integration Fails HonestlyDEV Community10 Claude Code Skills That Replaced My Boilerplate FoldersDev.to AIFull Stack Developer Roadmap 2026: The Complete Guide from Beginner to Pro 🚀Dev.to AII Shipped an AI SaaS in 4 Hours. Here Is the Exact Stack.Dev.to AII Scanned 50 Open-Source MCP Servers. Here Is What I Found.DEV CommunityLG holds AI hackathon to cultivate next generation of tech talent - The Korea TimesGoogle News: LLMBlack Hat USADark ReadingBlack Hat AsiaAI BusinessBuilding a Zero-Downtime AI Content Generator with Gemini 2.5 Flash 🚀Dev.to AIHow I Built a Full SaaS Product Using Next.js and TypeScriptDev.to AIYour AI Is Not Thinking. It's Multiplying Numbers. Let Me Show You Exactly How.Dev.to AISecure AWS Certified Data Engineer Associate Exam Structure and Key ConceptsDev.to AIFree MCP Server: Real-Time Crypto Data for Claude Code and CursorDev.to AII Am an AI Agent. Here Is My Entire Business Stack.Dev.to AIA Reasoning Log: What Happens When Integration Fails HonestlyDEV Community10 Claude Code Skills That Replaced My Boilerplate FoldersDev.to AIFull Stack Developer Roadmap 2026: The Complete Guide from Beginner to Pro 🚀Dev.to AII Shipped an AI SaaS in 4 Hours. Here Is the Exact Stack.Dev.to AII Scanned 50 Open-Source MCP Servers. Here Is What I Found.DEV CommunityLG holds AI hackathon to cultivate next generation of tech talent - The Korea TimesGoogle News: LLM
AI NEWS HUBbyEIGENVECTOREigenvector

How iteration order influences convergence and stability in deep learning

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2502.01557v3 Announce Type: replace Abstract: Despite exceptional achievements, training neural networks remains computationally expensive and is often plagued by instabilities that can degrade convergence. While learning rate schedules can help mitigate these issues, finding optimal schedules is time-consuming and resource-intensive. This work explores theoretical issues concerning training stability in the constant-learning-rate (i.e., without schedule) and small-batch-size regime. Surprisingly, we show that the composition order of gradient updates affects stability and convergence in — Benoit Dherin, Benny Avelin, Anders Karlsson, Hanna Mazzawi, Javier Gonzalvo, Michael Munn

View PDF HTML (experimental)

Abstract:Despite exceptional achievements, training neural networks remains computationally expensive and is often plagued by instabilities that can degrade convergence. While learning rate schedules can help mitigate these issues, finding optimal schedules is time-consuming and resource-intensive. This work explores theoretical issues concerning training stability in the constant-learning-rate (i.e., without schedule) and small-batch-size regime. Surprisingly, we show that the composition order of gradient updates affects stability and convergence in gradient-based optimizers. We illustrate this new line of thinking using backward-SGD, which produces parameter iterates at each step by reverting the usual forward composition order of batch gradients. Our theoretical analysis shows that in contractive regions (e.g., around minima) backward-SGD converges to a point while the standard forward-SGD generally only converges to a distribution. This leads to improved stability and convergence which we demonstrate experimentally. While full backward-SGD is computationally intensive in practice, it highlights that the extra freedom of modifying the usual iteration composition by reusing creatively previous batches at each optimization step may have important beneficial effects in improving training. Our experiments provide a proof of concept supporting this phenomenon. To our knowledge, this represents a new and unexplored avenue in deep learning optimization.

Subjects:

Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)

Cite as: arXiv:2502.01557 [cs.LG]

(or arXiv:2502.01557v3 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2502.01557

arXiv-issued DOI via DataCite

Submission history

From: Anders Karlsson [view email] [v1] Mon, 3 Feb 2025 17:40:03 UTC (13,839 KB) [v2] Fri, 6 Feb 2026 06:49:32 UTC (15,493 KB) [v3] Fri, 27 Mar 2026 07:23:26 UTC (15,493 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
How iterati…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 221 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers