AI Tools 69% 1 min readJun 25, 2026, 6:35 PM

LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Evolving story · 1 updatesIn-Browser LLM Inference via WebGPUTimeline →

30-second summary

A 230M-parameter LFM2.5 model runs locally in-browser at 1,400 tokens/sec using custom WebGPU kernels, leveraging prior work from Fable 5 and Opus 4.8.

LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Key takeaways

›LiquidAI/LFM2.5-230M runs locally in-browser at 1,400 tokens/sec using custom WebGPU kernels.
›Kernels were adapted from Fable 5 (shut down) and Opus 4.8, enabling efficient on-device inference.
›Demo available on Hugging Face Spaces for public testing.
›Performance achieved on an M4 Max Mac, demonstrating feasibility on consumer hardware.
›Showcases WebGPU as a viable path for high-performance, client-side AI without dedicated GPUs.

Full story

A developer demonstrated the LiquidAI/LFM2.5-230M model running entirely in a web browser via custom WebGPU kernels, achieving 1,400 tokens per second on an M4 Max Mac. The implementation builds on kernels originally developed for Fable 5 (before its shutdown) and Opus 4.8, showcasing efficient on-device inference. A Hugging Face Space provides a live demo for testing. The breakthrough highlights the potential of WebGPU for high-performance, client-side AI workloads without requiring dedicated hardware.

Source: LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels. Read the full piece at the source.

Why this matters

Developers

Demonstrates practical WebGPU-based inference for LLMs, reducing dependency on server-side hardware and enabling edge AI applications.

Businesses

Opens opportunities for privacy-focused, low-latency AI products that run entirely in-browser, reducing cloud costs.

Investors

Highlights advancements in on-device AI, potentially disrupting cloud-based inference markets with more efficient alternatives.

Students

Provides a tangible example of WebGPU's capabilities for AI workloads, useful for learning and experimentation.

Everyone

Shows that advanced AI can run locally on consumer devices, enhancing privacy and accessibility.

Glossary

WebGPU: A modern graphics and compute API for web browsers, enabling GPU acceleration for JavaScript applications.
Tokens/sec: A metric measuring the speed of a language model's inference, indicating how many tokens it can process per second.
GGUF: A file format for quantized large language models, optimized for efficient inference on consumer hardware.
Kernel: A low-level function that performs a specific computation, often optimized for hardware acceleration.

AI bias estimate: Neutral technical demonstration; no overt bias detected. (Automated estimate, not a definitive judgement.)

Sources · 1

LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels ↗

Summary and analysis generated by AI (mistral). Always verify against the original sources.

Suno launches Spark incubator program to feed independent artists to its AI machine

1 min read3d ago

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

1 min read3d ago

DeepSpec - a deepseek-ai Collection

1 min read3d ago

DFlash support merged into llama.cpp

1 min read3d ago