Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessIndia turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyYouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AI500 AI Demos at AZ Tech Week. Every One Hits the Same Scaling Ceiling.Dev.to AIPEDIGREE® Uses Artificial Intelligence to Drive Responsible Dog Adoption in Brazil - PA MediaGoogle News: AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessIndia turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyYouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AI500 AI Demos at AZ Tech Week. Every One Hits the Same Scaling Ceiling.Dev.to AIPEDIGREE® Uses Artificial Intelligence to Drive Responsible Dog Adoption in Brazil - PA MediaGoogle News: AI
AI NEWS HUBbyEIGENVECTOREigenvector

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

arXivby [Submitted on 26 Mar 2026]March 26, 20262 min read1 views
Source Quiz

Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for — Ligong Han, Hao Wang, Han Gao

View PDF HTML (experimental)

Abstract:Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for block-diffusion language models. Our key observation is that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to act as both drafter and verifier. S2D2 inserts a speculative verification step into standard block-diffusion decoding and uses lightweight routing policies to decide when verification is worth its cost. This yields a hybrid decoding trajectory in which diffusion proposes tokens in parallel, while the autoregressive mode acts as a local sequence-level critic. Across three mainstream block-diffusion families, S2D2 consistently improves the accuracy-speed tradeoff over strong confidence-thresholding baselines. On SDAR, we observe up to $4.7\times$ speedup over autoregressive decoding, and up to $1.57\times$ over a tuned dynamic decoding baseline while improving accuracy by up to $4.5$ points. On LLaDA2.1-Mini, S2D2 remains complementary to built-in self-correction, including a conservative setting where it is $4.4\times$ faster than the static baseline with slightly higher accuracy.

Comments: Code is available at this https URL

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.25702 [cs.CL]

(or arXiv:2603.25702v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25702

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ligong Han [view email] [v1] Thu, 26 Mar 2026 17:48:50 UTC (1,153 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
S2D2: Fast …researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 350 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers