Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5
Evolving story · 1 updatesGemma 4 WebGPU DevelopmentTimeline →Gemma 4 E2B is running in-browser at 255 tokens per second using WebGPU kernels written by Fable 5. The demo and kernels are now available for public testing.

- ›Gemma 4 E2B runs in-browser at 255 tokens per second using WebGPU kernels
- ›Fable 5 optimized the WebGPU kernels before its shutdown
- ›The demo and kernels are available for public testing
- ›The model is available on Hugging Face
- ›The achievement demonstrates the potential of WebGPU in accelerating AI models
The use of WebGPU kernels in the Gemma 4 E2B model allows for hardware acceleration, resulting in improved performance. The demo and kernels are available for testing, providing a chance for the community to engage with the model and provide feedback. The achievement is a significant step forward in the development of web-based AI applications, and it has the potential to enable new use cases and applications.
Source: Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5. Read the full piece at the source.
The release of the demo and kernels provides an opportunity for developers to explore the capabilities of the Gemma 4 E2B model and the potential of WebGPU in AI applications.
The achievement demonstrates the potential of WebGPU in accelerating AI models, which can lead to new business opportunities and applications.
The development of web-based AI applications using WebGPU kernels can attract investment in the field of AI research and development.
The release of the demo and kernels provides a chance for students to learn about the capabilities of the Gemma 4 E2B model and the potential of WebGPU in AI applications.
The achievement is a significant step forward in the development of web-based AI applications, and it has the potential to enable new use cases and applications.
- WebGPU
- A web-based API for accessing graphics processing units (GPUs) and other parallel computing devices.
- Fable 5
- A company that optimized WebGPU kernels for the Gemma 4 E2B model before its shutdown.
- Gemma 4 E2B
- A variant of the Gemma 4 model, designed for efficient inference on mobile and embedded devices.
AI bias estimate: The article appears to be neutral, providing factual information about the achievement. (Automated estimate, not a definitive judgement.)
Summary and analysis generated by AI (groq). Always verify against the original sources.

Suno launches Spark incubator program to feed independent artists to its AI machine

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

DeepSpec - a deepseek-ai Collection
