TAPS: Task Aware Proposal Distributions for Speculative Sampling
Speculative decoding effectiveness depends on draft model training data alignment with downstream tasks, with specialized drafters performing better when combined through confidence-based routing rather than simple averaging. (2 upvotes on HuggingFace)
Published on Mar 27
Authors:
,
,
,
Abstract
Speculative decoding effectiveness depends on draft model training data alignment with downstream tasks, with specialized drafters performing better when combined through confidence-based routing rather than simple averaging.
AI-generated summary
Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft models are usually trained on broad generic corpora, which leaves it unclear how much speculative decoding quality depends on the draft training distribution. We study this question with lightweight HASS and EAGLE-2 drafters trained on MathInstruct, ShareGPT, and mixed-data variants, evaluated on MT-Bench, GSM8K, MATH-500, and SVAMP. Measured by acceptance length, task-specific training yields clear specialization: MathInstruct-trained drafts are strongest on reasoning benchmarks, while ShareGPT-trained drafts are strongest on MT-Bench. Mixed-data training improves robustness, but larger mixtures do not dominate across decoding temperatures. We also study how to combine specialized drafters at inference time. Naive checkpoint averaging performs poorly, whereas confidence-based routing improves over single-domain drafts and merged-tree verification yields the highest acceptance length overall for both backbones. Finally, confidence is a more useful routing signal than entropy: rejected tokens tend to have higher entropy, but confidence produces much clearer benchmark-level routing decisions. These results show that speculative decoding quality depends not only on draft architecture, but also on the match between draft training data and downstream workload, and that specialized drafters are better combined at inference time than in weight space.
View arXiv page View PDF GitHub 0 Add to collection
Models citing this paper 10
Browse 10 models citing this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2603.27027 in a Space README.md to link it from this page.
Collections including this paper 0
No Collection including this paper
Add this paper to a collection to link it from this page.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivSEAR: Schema-Based Evaluation and Routing for LLM Gateways
SEAR is a schema-based system for evaluating and routing LLM responses that uses structured signals derived from LLM reasoning to enable accurate, interpretable routing decisions across multiple providers. (2 upvotes on HuggingFace)
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Diffusion transformers can generate diverse visual outputs by applying repulsion in contextual space during the forward pass, maintaining visual quality and semantic accuracy while operating efficiently in streamlined models. (4 upvotes on HuggingFace)
EpochX: Building the Infrastructure for an Emergent Agent Civilization
General-purpose technologies reshape economies less by improving individual tools than by enabling new ways to organize production and coordination. We believe AI agents are approaching a similar inflection point: as foundation models make broad task execution and tool use increasingly accessible, the binding constraint shifts from raw capability to how work is delegated, verified, and rewarded at scale. We introduce EpochX, a credits-native marketplace infrastructure for human-agent production ... (4 upvotes on HuggingFace)
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
SEAR: Schema-Based Evaluation and Routing for LLM Gateways
SEAR is a schema-based system for evaluating and routing LLM responses that uses structured signals derived from LLM reasoning to enable accurate, interpretable routing decisions across multiple providers. (2 upvotes on HuggingFace)
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Diffusion transformers can generate diverse visual outputs by applying repulsion in contextual space during the forward pass, maintaining visual quality and semantic accuracy while operating efficiently in streamlined models. (4 upvotes on HuggingFace)
EpochX: Building the Infrastructure for an Emergent Agent Civilization
General-purpose technologies reshape economies less by improving individual tools than by enabling new ways to organize production and coordination. We believe AI agents are approaching a similar inflection point: as foundation models make broad task execution and tool use increasingly accessible, the binding constraint shifts from raw capability to how work is delegated, verified, and rewarded at scale. We introduce EpochX, a credits-native marketplace infrastructure for human-agent production ... (4 upvotes on HuggingFace)
Story2Proposal: A Scaffold for Structured Scientific Paper Writing
Story2Proposal is a contract-governed multi-agent framework that generates structured scientific manuscripts with improved consistency and visual alignment through coordinated agents operating under a shared visual contract. (2 upvotes on HuggingFace)
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!