Research Papers research paper arxiv nlp language-models

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

arXivby [Submitted on 26 Mar 2026]March 26, 20262 min read1 views

Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for — Ligong Han, Hao Wang, Han Gao

View PDF HTML (experimental)

Abstract:Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for block-diffusion language models. Our key observation is that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to act as both drafter and verifier. S2D2 inserts a speculative verification step into standard block-diffusion decoding and uses lightweight routing policies to decide when verification is worth its cost. This yields a hybrid decoding trajectory in which diffusion proposes tokens in parallel, while the autoregressive mode acts as a local sequence-level critic. Across three mainstream block-diffusion families, S2D2 consistently improves the accuracy-speed tradeoff over strong confidence-thresholding baselines. On SDAR, we observe up to $4.7\times$ speedup over autoregressive decoding, and up to $1.57\times$ over a tuned dynamic decoding baseline while improving accuracy by up to $4.5$ points. On LLaDA2.1-Mini, S2D2 remains complementary to built-in self-correction, including a conservative setting where it is $4.4\times$ faster than the static baseline with slightly higher accuracy.

Comments: Code is available at this https URL

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.25702 [cs.CL]

(or arXiv:2603.25702v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25702

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ligong Han [view email] [v1] Thu, 26 Mar 2026 17:48:50 UTC (1,153 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25702v1

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

Linear Discriminant Analysis with Gradient Optimization

arXiv:2506.06845v2 Announce Type: replace-cross Abstract: Linear discriminant analysis (LDA) is a fundamental classification and dimension reduction method that achieves Bayes optimality under Gaussian mixture, but often struggles in high-dimensional settings where the covariance matrix cannot be reliably estimated. We propose LDA with gradient optimization (LDA-GO), which learns a low-rank precision matrix via scalable gradient-based optimization. The method automatically selects between a Gaussian likelihood and a cross-entropy loss using data-driven structural diagnostics, adapting to the s — Cencheng Shen, Yuexiao Dong

arXiv

10mabout 4 hours ago

$Langevin Diffusion Approximation to Same Marginal Schr\"{o}dinger Bridge$

Research PapersFresh

Langevin Diffusion Approximation to Same Marginal Schr\"{o}dinger Bridge

arXiv:2505.07647v2 Announce Type: replace-cross Abstract: We introduce a novel approximation to the same marginal Schr\"{o}dinger bridge using the Langevin diffusion. As $\varepsilon \downarrow 0$, it is known that the barycentric projection (also known as the entropic Brenier map) of the Schr\"{o}dinger bridge converges to the Brenier map, which is the identity. Our diffusion approximation is leveraged to show that, under suitable assumptions, the difference between the two is $\varepsilon$ times the gradient of the marginal log density (i.e., the score function), in $\mathbf{L}^2$. More gene — Medha Agarwal, Zaid Harchaoui, Garrett Mulcahy, Soumik Pal

arXiv

10mabout 4 hours ago

Research PapersFresh

Geometric Analysis of Magnetic Labyrinthine Stripe Evolution via U-Net Segmentation

arXiv:2509.11485v2 Announce Type: replace-cross Abstract: Labyrinthine stripe patterns are common in many physical systems, yet their lack of long-range order makes quantitative characterization challenging. We investigate the evolution of such patterns in bismuth-doped yttrium iron garnet (Bi:YIG) films subjected to a magnetic field annealing protocol. A U-Net deep learning model, trained with synthetic degradations including additive white Gaussian and Simplex noise, enables robust segmentation of experimental magneto-optical images despite noise and occlusions. Building on this segmentation — Vin\'icius Yu Okubo, Kotaro Shimizu, B. S. Shivaran, Gia-Wei Chern, Hae Yong Kim

arXiv

10mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 350 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Linear Discriminant Analysis with Gradient Optimization

arXiv

10mabout 4 hours ago

$Langevin Diffusion Approximation to Same Marginal Schr\"{o}dinger Bridge$

Research PapersFresh

Langevin Diffusion Approximation to Same Marginal Schr\"{o}dinger Bridge

arXiv

10mabout 4 hours ago

Research PapersFresh

Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception

arXiv:2511.15279v2 Announce Type: replace-cross Abstract: In embodied AI, visual perception should be active rather than passive: the system must decide where to look and at what scale to sense to acquire maximally informative data under pixel and spatial budget constraints. Existing vision models coupled with fixed RGB-D cameras fundamentally fail to reconcile wide-area coverage with fine-grained detail acquisition, severely limiting their efficacy in open-world robotic applications. We study the task of language-guided active visual perception: given a single RGB image and a natural language — Jiashu Yang, Yifan Han, Yucheng Xie, Ning Guo, Wenzhao Lian

arXiv

10mabout 4 hours ago

Research PapersFresh

Geometric Analysis of Magnetic Labyrinthine Stripe Evolution via U-Net Segmentation

arXiv

10mabout 4 hours ago