โ All stories
Speculative Decoding with Parallel Tree Drafting
JetSpec introduces a novel speculative decoding method using parallel tree drafting to achieve up to 9.64x lossless speedup in LLM inference, reaching over 1000 TPS on a single B200 GPU.
One continuously updated timeline instead of dozens of separate articles. New developments are appended as the story evolves.
- BenchmarkJun 25, 2026, 09:55 PM 70%
JetSpec demonstrates up to 9.64x lossless LLM speedup using parallel tree drafting and achieves over 1000 TPS on B200 GPU
JetSpec introduces a novel speculative decoding method using parallel tree drafting to achieve up to 9.64x lossless speedup in LLM inference, reaching over 1000 TPS on a single B200 GPU.
Read the full story โ