Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWhatsApp notifies hundreds of users who installed a fake app that was actually government spywareTechCrunchAI-Generated Go Serialization: Zero Boilerplate, Maximum SpeedDEV CommunityOpenAI & Anthropic Prove the AI Revolution is Just Starting - Zacks Investment ResearchGoogle News: OpenAII Built a Social Post Engine to Escape the Canva-Export-Schedule LoopDEV CommunityWhen Chrome Ate My RAM: Designing a Pressure-Aware Tab Orchestrator with RustDEV CommunityWhy Your System Fails on the Most Predictable Day of the YearDEV CommunityDeployment Hooks Explained: Running Custom Scripts During Every DeployDEV CommunityI built a knowledge archive for AI agents — here's how the hash chain and trust engine workDEV CommunitySwartz Mind/Brain Lecture Explores How AI Could Decode and Shape Human Vision - SBU NewsGoogle News: AIGoogle Drive can now detect ransomware and roll back your filesTechSpotOpenAI's $122B in funding comes at a perilous moment - theregister.comGoogle News: OpenAIAI models will secretly scheme to protect other AI models from being shut down, researchers find - FortuneGoogle News: AI SafetyBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWhatsApp notifies hundreds of users who installed a fake app that was actually government spywareTechCrunchAI-Generated Go Serialization: Zero Boilerplate, Maximum SpeedDEV CommunityOpenAI & Anthropic Prove the AI Revolution is Just Starting - Zacks Investment ResearchGoogle News: OpenAII Built a Social Post Engine to Escape the Canva-Export-Schedule LoopDEV CommunityWhen Chrome Ate My RAM: Designing a Pressure-Aware Tab Orchestrator with RustDEV CommunityWhy Your System Fails on the Most Predictable Day of the YearDEV CommunityDeployment Hooks Explained: Running Custom Scripts During Every DeployDEV CommunityI built a knowledge archive for AI agents — here's how the hash chain and trust engine workDEV CommunitySwartz Mind/Brain Lecture Explores How AI Could Decode and Shape Human Vision - SBU NewsGoogle News: AIGoogle Drive can now detect ransomware and roll back your filesTechSpotOpenAI's $122B in funding comes at a perilous moment - theregister.comGoogle News: OpenAIAI models will secretly scheme to protect other AI models from being shut down, researchers find - FortuneGoogle News: AI Safety

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2512.13607v2 Announce Type: replace-cross Abstract: Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging. In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop Nemotron-Cascade, capable of operating in both instruct and deep — Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Authors:Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

View PDF HTML (experimental)

Abstract:Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging. In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop Nemotron-Cascade, capable of operating in both instruct and deep thinking modes, without any performance gap relative to a thinking-only counterpart. Departing from conventional approaches that blend heterogeneous prompts from different domains, Cascade RL orchestrates sequential, domain-wise RL, reducing engineering complexity and delivering state-of-the-art performance across a wide range of benchmarks. Notably, RLHF for alignment, when used as a pre-step, boosts the model's reasoning ability far beyond mere preference optimization, and subsequent domain-wise RLVR stages rarely degrade the benchmark performance attained in earlier domains and may even improve it (see an illustration in Figure 1). Our 14B model, after RL, outperforms its SFT teacher, DeepSeek-R1-0528, on LiveCodeBench v5/v6/Pro and achieves silver-medal performance in the 2025 International Olympiad in Informatics (IOI). We transparently share our training and data recipes.

Comments: We publicly release the Nemotron-Cascade models and the full collection of training data at: this https URL

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2512.13607 [cs.CL]

(or arXiv:2512.13607v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2512.13607

arXiv-issued DOI via DataCite

Submission history

From: Wei Ping [view email] [v1] Mon, 15 Dec 2025 18:02:35 UTC (1,899 KB) [v2] Fri, 27 Mar 2026 06:18:40 UTC (1,899 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Nemotron-Ca…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 190 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers