Research Papers research paper arxiv ai artificial-intelligence

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

arXivMarch 30, 202610 min read0 views

arXiv:2512.13607v2 Announce Type: replace-cross Abstract: Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging. In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop Nemotron-Cascade, capable of operating in both instruct and deep — Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Authors:Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

View PDF HTML (experimental)

Abstract:Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-domain heterogeneity, including large variation in inference-time response lengths and verification latency. Such variability complicates the RL infrastructure, slows training, and makes training curriculum (e.g., response length extension) and hyperparameter selection challenging. In this work, we propose cascaded domain-wise reinforcement learning (Cascade RL) to develop Nemotron-Cascade, capable of operating in both instruct and deep thinking modes, without any performance gap relative to a thinking-only counterpart. Departing from conventional approaches that blend heterogeneous prompts from different domains, Cascade RL orchestrates sequential, domain-wise RL, reducing engineering complexity and delivering state-of-the-art performance across a wide range of benchmarks. Notably, RLHF for alignment, when used as a pre-step, boosts the model's reasoning ability far beyond mere preference optimization, and subsequent domain-wise RLVR stages rarely degrade the benchmark performance attained in earlier domains and may even improve it (see an illustration in Figure 1). Our 14B model, after RL, outperforms its SFT teacher, DeepSeek-R1-0528, on LiveCodeBench v5/v6/Pro and achieves silver-medal performance in the 2025 International Olympiad in Informatics (IOI). We transparently share our training and data recipes.

Comments: We publicly release the Nemotron-Cascade models and the full collection of training data at: this https URL

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2512.13607 [cs.CL]

(or arXiv:2512.13607v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2512.13607

arXiv-issued DOI via DataCite

Submission history

From: Wei Ping [view email] [v1] Mon, 15 Dec 2025 18:02:35 UTC (1,899 KB) [v2] Fri, 27 Mar 2026 06:18:40 UTC (1,899 KB)

Original source

arXiv

https://arxiv.org/abs/2512.13607

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsLive

AI models will secretly scheme to protect other AI models from being shut down, researchers find - Fortune

<a href="https://news.google.com/rss/articles/CBMixgFBVV95cUxPdDVrRUpkN1RRQU91SDJYYzVzejV4b1JoTWdwVEZVamltZHdKaGtfS3FNQlMyWVdmS2NqRi1pUHJWbG9KX1ZkUmFPeEllc0Q1SjlPdnVPMHRYTXE2S2EtbThEM1lncnVac01Wc2N2V0NGelIwUVFWUTFtdGRxMGpSby11QWNEcHlqcF96QWhuYWQ0YWFuWDBhWGFqSDNFRVNGc19uNzJnUHR4X0VxQzdZTDhUNjg2Y3pOWWw2QjUweFc0djFUSFE?oc=5" target="_blank">AI models will secretly scheme to protect other AI models from being shut down, researchers find</a> Fortune

Google News: AI Safety

1m16 minutes ago

Open Source AILive

I Built a Social Post Engine to Escape the Canva-Export-Schedule Loop

As a solo founder running WahResume.com, I was spending way too much time on social media - not on creativity, but on process. Same templates. Same brand assets. Same hashtags. Every post meant opening Canva, exporting, uploading, scheduling… and repeating it the next day. So I built something to fix that. Social Post Engine is a small tool that helps me stay consistent on social media without having to touch Canva or an endless queue of schedulers. Here’s what it does: ✅ Seed & review topics in one command — it researches, outlines, and preps your next posts. ✅ Pre-generates branded images from templates (checklists, stat cards, charts, comparisons). It also writes captions in your brand’s voice using AI. ✅ Publishes automatically to LinkedIn

DEV Community

2m10 minutes ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxOQU9Xc09YTnZwb0Myb3VQMDk5MjVGeE50aEUzbkhWdW1OcUltMGMtQXZwYkN0R2l4ZTloTU1scUNkdTA0cHgwdG9LS2lYazk4dWxMLXJuU0liZnN1S2c2RmszV054VUJCMXhrZjFuQmtUQjk0aGU1M1V2RVpfQ0d3amhYMF92dzFhWGkzelFKd2VhaGJDOV9uOXBfZFpkc3A5N3JnT2dNSzBTMGE3Q0pfdzJrbTI4ZmY4S2dYOG1uaTl4UTZoMFFadE54cHlxUk03ZFgwZm1qV2ZnazJTcFNnX2dMN19xMHZtTHB5QmpQeDFKRDljNi1BX01vc2hkQV9rWHpYNk9oSXVpR1pWS2VQVjVIOHhlVHFqalJRZGZTWXd0VjhfMXFhQ3RXLWdNaS03cDYxMDYxWmlCUEg5MjVzNWg4RGVWVks3b3BLSWpOUXpBU255NDBMRFhzd2lwNTBmOGRHVlFXaGRsR3VaaFZJOU9VTDlXZTQ5V3JVcGRSWG13amZLWjNaVm5RejRTT1NlNTFxOXozWEh3eFd2UlByNUJFWHQtSGJUdS1fdy1UM1ZvcUs4ZGlHeQ?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1mabout 23 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 190 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno

<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> University of Nevada, Reno

Google News: AI

1mabout 1 hour ago

Research PapersLive

AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes

<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> Forbes

Google News: Machine Learning

1m35 minutes ago

Research PapersLive

Can Science Predict When a Study Won’t Hold Up?

Conducting research is hard; confirming the results is, too. And artificial intelligence isn’t yet ready to help, a major new study finds.

NYT Technology

1mabout 2 hours ago

Research PapersFresh

Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st

<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> simplywall.st

GNews AI USA

1mabout 4 hours ago