โ All stories
In-Browser LLM Inference via WebGPU
A 230M-parameter LFM2.5 model runs locally in-browser at 1,400 tokens/sec using custom WebGPU kernels, leveraging prior work from Fable 5 and Opus 4.8.
One continuously updated timeline instead of dozens of separate articles. New developments are appended as the story evolves.
- ReleaseJun 25, 2026, 06:35 PM 69%
LFM2.5-230M achieves 1,400 tokens/sec in-browser using custom WebGPU kernels
A 230M-parameter LFM2.5 model runs locally in-browser at 1,400 tokens/sec using custom WebGPU kernels, leveraging prior work from Fable 5 and Opus 4.8.
Read the full story โ