LLM 66% 1 min readJun 25, 2026, 8:34 AM

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

Evolving story · 1 updatesNVIDIA's Diffusion-Based Nemotron ModelTimeline →

30-second summary

NVIDIA released Nemotron-TwoTower-30B-A3B-Base-BF16, a diffusion-based language model using a two-tower architecture with parallel token generation, built on the Nemotron 3 Nano 30B-A3B backbone.

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

Key takeaways

›Nemotron-TwoTower-30B-A3B-Base-BF16 is a diffusion-based language model, not a traditional autoregressive one.
›It uses a two-tower architecture: a frozen autoregressive context tower and a diffusion denoiser tower for parallel token generation.
›The model retains 98.7% of the performance of its backbone (Nemotron 3 Nano 30B-A3B).
›Built with BF16 precision, it targets efficiency and scalability improvements.
›Released as part of NVIDIA's Nemotron series, though details are sparse beyond the Reddit announcement.

Full story

NVIDIA has unveiled Nemotron-TwoTower-30B-A3B-Base-BF16, a novel diffusion-based language model that diverges from traditional autoregressive generation. Instead of generating tokens sequentially, it employs a frozen autoregressive context tower paired with a diffusion denoiser tower that fills blocks of tokens in parallel. This approach aims to improve efficiency and scalability while retaining 98.7% of the original model's performance. The model is built on the Nemotron 3 Nano 30B-A3B backbone and uses BF16 precision for inference.

Source: NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.. Read the full piece at the source.

Why this matters

Developers

Introduces a novel parallel token generation approach for language models, potentially improving inference speed and scalability for large models.

Businesses

Could enable more efficient deployment of large language models, reducing compute costs and latency for inference tasks.

Investors

Signals NVIDIA's continued innovation in model architectures, which may influence market positioning in AI infrastructure.

Students

Demonstrates an alternative to traditional autoregressive models, relevant for research in diffusion-based language generation.

Everyone

Highlights NVIDIA's push beyond standard LLM architectures, though practical impact remains to be seen.

Glossary

diffusion-based language model: A model that generates text by iteratively refining noisy token sequences, rather than predicting tokens sequentially.
two-tower architecture: A model design with separate specialized components (e.g., context and denoiser towers) working in tandem.
BF16: A 16-bit floating-point format used for efficient model inference and training.
autoregressive generation: A model that generates tokens one at a time, conditioned on previously generated tokens.

AI bias estimate: Neutral reporting of a technical release; limited context beyond Reddit source. (Automated estimate, not a definitive judgement.)

Sources · 2

Summary and analysis generated by AI (mistral). Always verify against the original sources.

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!

1 min read1w ago

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

OpenAI unveils GPT-5.6 amid US AI regulatory drama

Previewing GPT-5.6 Sol: a next-generation model

Evaluating a C# LLM Eventparser with Promptfoo

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced are out with MTP (35% & 53% speed boost)!