Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessUnderstanding Token Classification in NLP: NER, POS Tagging & Chunking ExplainedMedium AIIntroducing ForestFire, a new tree-learning libraryMedium AIBuy Verified Coinbase Accounts - 100% active and safeDev.to AI90% людей используют нейросети как поисковик. И проигрывают.Dev.to AIContinuing the idea of building a one-person unicorn, it is important to recognize that this…Medium AIHow to Build an AI Content Playbook That Actually Protects Your VoiceDev.to AIExploring Early Web Patterns for Modern AI Agent DevelopmentDev.to AIUnderstanding NLP Token Classification : A Beginner-Friendly GuideMedium AIHow Do You Actually Scale High-Throughput LLM Serving in Production with vLLM?Medium AIGemma 4 and the On-Device AI Revolution No One Prepared You ForDev.to AI5 Claude Entrances That Doubled My Workflow EfficiencyDev.to AII built a tool that turns messy Git history into Architecture Maps and Exec Briefings (RepoWrit)Dev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessUnderstanding Token Classification in NLP: NER, POS Tagging & Chunking ExplainedMedium AIIntroducing ForestFire, a new tree-learning libraryMedium AIBuy Verified Coinbase Accounts - 100% active and safeDev.to AI90% людей используют нейросети как поисковик. И проигрывают.Dev.to AIContinuing the idea of building a one-person unicorn, it is important to recognize that this…Medium AIHow to Build an AI Content Playbook That Actually Protects Your VoiceDev.to AIExploring Early Web Patterns for Modern AI Agent DevelopmentDev.to AIUnderstanding NLP Token Classification : A Beginner-Friendly GuideMedium AIHow Do You Actually Scale High-Throughput LLM Serving in Production with vLLM?Medium AIGemma 4 and the On-Device AI Revolution No One Prepared You ForDev.to AI5 Claude Entrances That Doubled My Workflow EfficiencyDev.to AII built a tool that turns messy Git history into Architecture Maps and Exec Briefings (RepoWrit)Dev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Mini-batch Estimation for Deep Cox Models: Statistical Foundations and Practical Guidance

arXivby [Submitted on 5 Aug 2024 (v1), last revised 30 Mar 2026 (this version, v5)]March 31, 20262 min read1 views
Source Quiz

arXiv:2408.02839v5 Announce Type: replace-cross Abstract: The stochastic gradient descent (SGD) algorithm has been widely used to optimize deep Cox neural network (Cox-NN) by updating model parameters using mini-batches of data. We show that SGD aims to optimize the average of mini-batch partial-likelihood, which is different from the standard partial-likelihood. This distinction requires developing new statistical properties for the global optimizer, namely, the mini-batch maximum partial-likelihood estimator (mb-MPLE). We establish that mb-MPLE for Cox-NN is consistent and achieves the optim — Lang Zeng, Weijing Tang, Zhao Ren, Ying Ding

View PDF HTML (experimental)

Abstract:The stochastic gradient descent (SGD) algorithm has been widely used to optimize deep Cox neural network (Cox-NN) by updating model parameters using mini-batches of data. We show that SGD aims to optimize the average of mini-batch partial-likelihood, which is different from the standard partial-likelihood. This distinction requires developing new statistical properties for the global optimizer, namely, the mini-batch maximum partial-likelihood estimator (mb-MPLE). We establish that mb-MPLE for Cox-NN is consistent and achieves the optimal minimax convergence rate up to a polylogarithmic factor. For Cox regression with linear covariate effects, we further show that mb-MPLE is $\sqrt{n}$-consistent and asymptotically normal with asymptotic variance approaching the information lower bound as batch size increases, which is confirmed by simulation studies. Additionally, we offer practical guidance on using SGD, supported by theoretical analysis and numerical evidence. For Cox-NN, we demonstrate that the ratio of the learning rate to the batch size is critical in SGD dynamics, offering insight into hyperparameter tuning. For Cox regression, we characterize the iterative convergence of SGD, ensuring that the global optimizer, mb-MPLE, can be approximated with sufficiently many iterations. Finally, we demonstrate the effectiveness of mb-MPLE in a large-scale real-world application where the standard MPLE is intractable.

Subjects:

Machine Learning (stat.ML); Machine Learning (cs.LG)

Cite as: arXiv:2408.02839 [stat.ML]

(or arXiv:2408.02839v5 [stat.ML] for this version)

https://doi.org/10.48550/arXiv.2408.02839

arXiv-issued DOI via DataCite

Related DOI:

https://doi.org/10.1080/01621459.2026.2644611

DOI(s) linking to related resources

Submission history

From: Lang Zeng [view email] [v1] Mon, 5 Aug 2024 21:25:10 UTC (942 KB) [v2] Thu, 9 Oct 2025 20:35:15 UTC (984 KB) [v3] Fri, 6 Mar 2026 19:31:07 UTC (973 KB) [v4] Wed, 11 Mar 2026 20:17:04 UTC (1,904 KB) [v5] Mon, 30 Mar 2026 00:56:20 UTC (1,904 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Mini-batch …researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 137 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!