S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for — Ligong Han, Hao Wang, Han Gao
View PDF HTML (experimental)
Abstract:Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for block-diffusion language models. Our key observation is that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to act as both drafter and verifier. S2D2 inserts a speculative verification step into standard block-diffusion decoding and uses lightweight routing policies to decide when verification is worth its cost. This yields a hybrid decoding trajectory in which diffusion proposes tokens in parallel, while the autoregressive mode acts as a local sequence-level critic. Across three mainstream block-diffusion families, S2D2 consistently improves the accuracy-speed tradeoff over strong confidence-thresholding baselines. On SDAR, we observe up to $4.7\times$ speedup over autoregressive decoding, and up to $1.57\times$ over a tuned dynamic decoding baseline while improving accuracy by up to $4.5$ points. On LLaDA2.1-Mini, S2D2 remains complementary to built-in self-correction, including a conservative setting where it is $4.4\times$ faster than the static baseline with slightly higher accuracy.
Comments: Code is available at this https URL
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.25702 [cs.CL]
(or arXiv:2603.25702v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.25702
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Ligong Han [view email] [v1] Thu, 26 Mar 2026 17:48:50 UTC (1,153 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Linear Discriminant Analysis with Gradient Optimization
arXiv:2506.06845v2 Announce Type: replace-cross Abstract: Linear discriminant analysis (LDA) is a fundamental classification and dimension reduction method that achieves Bayes optimality under Gaussian mixture, but often struggles in high-dimensional settings where the covariance matrix cannot be reliably estimated. We propose LDA with gradient optimization (LDA-GO), which learns a low-rank precision matrix via scalable gradient-based optimization. The method automatically selects between a Gaussian likelihood and a cross-entropy loss using data-driven structural diagnostics, adapting to the s — Cencheng Shen, Yuexiao Dong

Langevin Diffusion Approximation to Same Marginal Schr\"{o}dinger Bridge
arXiv:2505.07647v2 Announce Type: replace-cross Abstract: We introduce a novel approximation to the same marginal Schr\"{o}dinger bridge using the Langevin diffusion. As $\varepsilon \downarrow 0$, it is known that the barycentric projection (also known as the entropic Brenier map) of the Schr\"{o}dinger bridge converges to the Brenier map, which is the identity. Our diffusion approximation is leveraged to show that, under suitable assumptions, the difference between the two is $\varepsilon$ times the gradient of the marginal log density (i.e., the score function), in $\mathbf{L}^2$. More gene — Medha Agarwal, Zaid Harchaoui, Garrett Mulcahy, Soumik Pal

Geometric Analysis of Magnetic Labyrinthine Stripe Evolution via U-Net Segmentation
arXiv:2509.11485v2 Announce Type: replace-cross Abstract: Labyrinthine stripe patterns are common in many physical systems, yet their lack of long-range order makes quantitative characterization challenging. We investigate the evolution of such patterns in bismuth-doped yttrium iron garnet (Bi:YIG) films subjected to a magnetic field annealing protocol. A U-Net deep learning model, trained with synthetic degradations including additive white Gaussian and Simplex noise, enables robust segmentation of experimental magneto-optical images despite noise and occlusions. Building on this segmentation — Vin\'icius Yu Okubo, Kotaro Shimizu, B. S. Shivaran, Gia-Wei Chern, Hae Yong Kim
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Linear Discriminant Analysis with Gradient Optimization
arXiv:2506.06845v2 Announce Type: replace-cross Abstract: Linear discriminant analysis (LDA) is a fundamental classification and dimension reduction method that achieves Bayes optimality under Gaussian mixture, but often struggles in high-dimensional settings where the covariance matrix cannot be reliably estimated. We propose LDA with gradient optimization (LDA-GO), which learns a low-rank precision matrix via scalable gradient-based optimization. The method automatically selects between a Gaussian likelihood and a cross-entropy loss using data-driven structural diagnostics, adapting to the s — Cencheng Shen, Yuexiao Dong

Langevin Diffusion Approximation to Same Marginal Schr\"{o}dinger Bridge
arXiv:2505.07647v2 Announce Type: replace-cross Abstract: We introduce a novel approximation to the same marginal Schr\"{o}dinger bridge using the Langevin diffusion. As $\varepsilon \downarrow 0$, it is known that the barycentric projection (also known as the entropic Brenier map) of the Schr\"{o}dinger bridge converges to the Brenier map, which is the identity. Our diffusion approximation is leveraged to show that, under suitable assumptions, the difference between the two is $\varepsilon$ times the gradient of the marginal log density (i.e., the score function), in $\mathbf{L}^2$. More gene — Medha Agarwal, Zaid Harchaoui, Garrett Mulcahy, Soumik Pal

Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception
arXiv:2511.15279v2 Announce Type: replace-cross Abstract: In embodied AI, visual perception should be active rather than passive: the system must decide where to look and at what scale to sense to acquire maximally informative data under pixel and spatial budget constraints. Existing vision models coupled with fixed RGB-D cameras fundamentally fail to reconcile wide-area coverage with fine-grained detail acquisition, severely limiting their efficacy in open-world robotic applications. We study the task of language-guided active visual perception: given a single RGB image and a natural language — Jiashu Yang, Yifan Han, Yucheng Xie, Ning Guo, Wenzhao Lian

Geometric Analysis of Magnetic Labyrinthine Stripe Evolution via U-Net Segmentation
arXiv:2509.11485v2 Announce Type: replace-cross Abstract: Labyrinthine stripe patterns are common in many physical systems, yet their lack of long-range order makes quantitative characterization challenging. We investigate the evolution of such patterns in bismuth-doped yttrium iron garnet (Bi:YIG) films subjected to a magnetic field annealing protocol. A U-Net deep learning model, trained with synthetic degradations including additive white Gaussian and Simplex noise, enables robust segmentation of experimental magneto-optical images despite noise and occlusions. Building on this segmentation — Vin\'icius Yu Okubo, Kotaro Shimizu, B. S. Shivaran, Gia-Wei Chern, Hae Yong Kim


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!