← Back to feed
AI Tools 69% 1 min readJun 25, 2026, 6:35 PM

LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels

Evolving story · 1 updatesIn-Browser LLM Inference via WebGPUTimeline →
30-second summary

A 230M-parameter LFM2.5 model runs locally in-browser at 1,400 tokens/sec using custom WebGPU kernels, leveraging prior work from Fable 5 and Opus 4.8.

LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels
Key takeaways
  • LiquidAI/LFM2.5-230M runs locally in-browser at 1,400 tokens/sec using custom WebGPU kernels.
  • Kernels were adapted from Fable 5 (shut down) and Opus 4.8, enabling efficient on-device inference.
  • Demo available on Hugging Face Spaces for public testing.
  • Performance achieved on an M4 Max Mac, demonstrating feasibility on consumer hardware.
  • Showcases WebGPU as a viable path for high-performance, client-side AI without dedicated GPUs.
Full story

A developer demonstrated the LiquidAI/LFM2.5-230M model running entirely in a web browser via custom WebGPU kernels, achieving 1,400 tokens per second on an M4 Max Mac. The implementation builds on kernels originally developed for Fable 5 (before its shutdown) and Opus 4.8, showcasing efficient on-device inference. A Hugging Face Space provides a live demo for testing. The breakthrough highlights the potential of WebGPU for high-performance, client-side AI workloads without requiring dedicated hardware.

Source: LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels. Read the full piece at the source.

Why this matters
Developers

Demonstrates practical WebGPU-based inference for LLMs, reducing dependency on server-side hardware and enabling edge AI applications.

Businesses

Opens opportunities for privacy-focused, low-latency AI products that run entirely in-browser, reducing cloud costs.

Investors

Highlights advancements in on-device AI, potentially disrupting cloud-based inference markets with more efficient alternatives.

Students

Provides a tangible example of WebGPU's capabilities for AI workloads, useful for learning and experimentation.

Everyone

Shows that advanced AI can run locally on consumer devices, enhancing privacy and accessibility.

Glossary
WebGPU
A modern graphics and compute API for web browsers, enabling GPU acceleration for JavaScript applications.
Tokens/sec
A metric measuring the speed of a language model's inference, indicating how many tokens it can process per second.
GGUF
A file format for quantized large language models, optimized for efficient inference on consumer hardware.
Kernel
A low-level function that performs a specific computation, often optimized for hardware acceleration.

AI bias estimate: Neutral technical demonstration; no overt bias detected. (Automated estimate, not a definitive judgement.)

Sources · 1

Summary and analysis generated by AI (mistral). Always verify against the original sources.

Related
TickrWire

AI news intelligence. We aggregate, verify, summarise and explain the latest artificial intelligence news from open, legal sources.

Daily AI digest

Top AI stories, summarised, in your inbox each morning.

© 2026 TickrWire. Summaries and analysis are AI-generated and may contain errors.Privacy