
DistillSpec: Speculative Decoding Method
Oct 22, 2025 · The paper introduces a framework that aligns a lightweight draft model with a large target model to improve token acceptance rates and achieve 10–45% inference speedup. It …
[2503.07807] Training Domain Draft Models for Speculative Decoding ...
Mar 10, 2025 · However, when adapting speculative decoding to domain-specific target models, the acceptance rate of the generic draft model drops significantly due to domain shift. In this …
ABSTRACT Speculative decoding (SD) accelerates large language model inference by employing a faster draft model for generating multiple tokens, which are then verified in parallel by the …
ABSTRACT predict the output of a target model. However, when adapting speculative decoding to domain-specific target models, the acceptance rate of the generic draft model d
DistillSpec: Improving speculative decoding via knowledge distillation
Nonetheless, identifying an accurate and compact draft model aligned with the target model presents challenges. To address this, we propose leveraging white-box knowledge distillation, …
Training Domain Draft Models for Speculative Decoding: Best …
Jan 1, 2025 · Download paper here Recommended citation: Fenglu Hong and Ravi Raju and Jonathan Lingjie Li and Bo Li and Urmish Thakker and Avinash Ravichandran and …
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft …
Training Domain Draft Models for Speculative Decoding: Best …
Mar 11, 2025 · The research investigates optimal methods for training draft models used in speculative decoding. The authors experiment with various training methods, focusing on …
AdaSPEC: Selective Knowledge Distillation for Efficient Spec
Adaptive selective distillation for faster speculative decoding At first glance, the efficiency trick called Speculative Decoding looks straightforward: a smaller draft model proposes tokens and …
Training Domain Draft Models for Speculative Decoding: Best …
However, when adapting speculative decoding to domain-specific target models, the acceptance rate of the generic draft model drops significantly due to domain shift. In this work, we …
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative ...
Oct 22, 2025 · AdaSPEC is a novel method that enhances speculative decoding by selectively filtering difficult tokens during knowledge distillation, resulting in improved token acceptance …