AI Research 75% 1 min readJul 3, 2026, 9:18 PM

H64LM: A 249M-parameter Mixture-of-Experts Transformer built from scratch in PyTorch [P]

30-second summary

A researcher has built H64LM, a 249M-parameter Transformer model from scratch in PyTorch, featuring a mixture-of-experts architecture. The model includes grouped query attention and sparse routing.

Key takeaways
  • H64LM is a 249M-parameter Transformer model built from scratch in PyTorch
  • The model features a mixture-of-experts architecture with 8 experts and Top-2 sparse routing
  • The researcher implemented core components, including attention and normalization, from scratch
Full story

The H64LM model is a research project aimed at understanding the inner workings of modern large language models.

It features a mixture-of-experts architecture, which allows the model to efficiently process input data by routing it to the most relevant experts. The model also includes grouped query attention, which enables it to focus on specific parts of the input data.

The researcher implemented the core components of the model from scratch, including attention, MoE routing, normalization, and the training loop. This approach allows for a deeper understanding of the model's behavior and performance.

The release of H64LM provides a valuable resource for researchers and developers looking to explore the capabilities of large language models and improve their performance.

Source: H64LM: A 249M-parameter Mixture-of-Experts Transformer built from scratch in PyTorch [P]. Read the full piece at the source.

Why this matters
Developers

provides a custom-built model for experimentation and improvement

Everyone

advances the field of natural language processing

Glossary
Mixture-of-Experts
a type of neural network architecture that routes input data to a subset of experts for processing
Sources · 1
Related
TickrWire

AI news intelligence. We aggregate, verify, summarise and explain the latest artificial intelligence news from open, legal sources.

Daily AI digest

Top AI stories, summarised, in your inbox each morning.

© 2026 TickrWire. Summaries and analysis are AI-generated and may contain errors.Privacy