AI Tools 80% 1 min readJun 17, 2026, 5:06 PM

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Evolving story · 1 updatesGemma 4 WebGPU DevelopmentTimeline →

30-second summary

Gemma 4 E2B is running in-browser at 255 tokens per second using WebGPU kernels written by Fable 5. The demo and kernels are now available for public testing.

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Key takeaways

›Gemma 4 E2B runs in-browser at 255 tokens per second using WebGPU kernels
›Fable 5 optimized the WebGPU kernels before its shutdown
›The demo and kernels are available for public testing
›The model is available on Hugging Face
›The achievement demonstrates the potential of WebGPU in accelerating AI models

Full story

The use of WebGPU kernels in the Gemma 4 E2B model allows for hardware acceleration, resulting in improved performance. The demo and kernels are available for testing, providing a chance for the community to engage with the model and provide feedback. The achievement is a significant step forward in the development of web-based AI applications, and it has the potential to enable new use cases and applications.

Source: Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5. Read the full piece at the source.

Why this matters

Developers

The release of the demo and kernels provides an opportunity for developers to explore the capabilities of the Gemma 4 E2B model and the potential of WebGPU in AI applications.

Businesses

The achievement demonstrates the potential of WebGPU in accelerating AI models, which can lead to new business opportunities and applications.

Investors

The development of web-based AI applications using WebGPU kernels can attract investment in the field of AI research and development.

Students

The release of the demo and kernels provides a chance for students to learn about the capabilities of the Gemma 4 E2B model and the potential of WebGPU in AI applications.

Everyone

The achievement is a significant step forward in the development of web-based AI applications, and it has the potential to enable new use cases and applications.

Glossary

WebGPU: A web-based API for accessing graphics processing units (GPUs) and other parallel computing devices.
Fable 5: A company that optimized WebGPU kernels for the Gemma 4 E2B model before its shutdown.
Gemma 4 E2B: A variant of the Gemma 4 model, designed for efficient inference on mobile and embedded devices.

AI bias estimate: The article appears to be neutral, providing factual information about the achievement. (Automated estimate, not a definitive judgement.)

Sources · 1

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5 ↗

Summary and analysis generated by AI (groq). Always verify against the original sources.

Suno launches Spark incubator program to feed independent artists to its AI machine

1 min read3d ago

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

1 min read3d ago

DeepSpec - a deepseek-ai Collection

1 min read3d ago

DFlash support merged into llama.cpp

1 min read3d ago