Developing story AI Research1 updates today

Speculative Decoding with Parallel Tree Drafting

JetSpec introduces a novel speculative decoding method using parallel tree drafting to achieve up to 9.64x lossless speedup in LLM inference, reaching over 1000 TPS on a single B200 GPU.

One continuously updated timeline instead of dozens of separate articles. New developments are appended as the story evolves.

BenchmarkJun 25, 2026, 09:55 PM 70%
JetSpec demonstrates up to 9.64x lossless LLM speedup using parallel tree drafting and achieves over 1000 TPS on B200 GPU
JetSpec introduces a novel speculative decoding method using parallel tree drafting to achieve up to 9.64x lossless speedup in LLM inference, reaching over 1000 TPS on a single B200 GPU.
Read the full story →

Speculative Decoding with Parallel Tree Drafting

JetSpec demonstrates up to 9.64x lossless LLM speedup using parallel tree drafting and achieves over 1000 TPS on B200 GPU