NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone
NVIDIA released Nemotron-Labs-TwoTower, an open-weight diffusion language model leveraging a frozen autoregressive backbone to boost text generation throughput.

- Nemotron-Labs-TwoTower is a diffusion language model built on a frozen autoregressive backbone (Nemotron-3-Nano-30B-A3B).
- The model is released as open weights under NVIDIA's Nemotron Open Model License.
- Diffusion-based generation enables parallel token processing, addressing throughput bottlenecks in autoregressive models.
- Targeted at developers and enterprises needing high-speed text generation for large-scale applications.
NVIDIA has introduced Nemotron-Labs-TwoTower, a diffusion-based language model designed to address the throughput limitations of traditional autoregressive (AR) models. Unlike AR models that generate tokens sequentially, Nemotron-Labs-TwoTower uses a diffusion approach, enabling parallel token generation and significantly improving generation speed.
The model is built on a frozen autoregressive backbone, Nemotron-3-Nano-30B-A3B, and released under the NVIDIA Nemotron Open Model License as open weights. This hybrid approach aims to combine the strengths of both diffusion and autoregressive models, offering a practical solution for high-throughput text generation tasks.
The release targets a critical bottleneck in AR models, where token-by-token decoding restricts performance in large-scale applications. By leveraging diffusion techniques, NVIDIA positions Nemotron-Labs-TwoTower as a scalable alternative for developers and enterprises requiring efficient text generation.
Source: NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone. Read the full piece at the source.
Provides an open-weight, high-throughput alternative to autoregressive models for text generation.
Enables faster and more scalable text generation, reducing latency in AI-driven applications.
Advances the efficiency of AI text generation with a novel hybrid model approach.
- Diffusion language model
- A model that generates text by iteratively refining a sequence of tokens, enabling parallel processing and faster throughput compared to autoregressive models.
- Autoregressive (AR) model
- A model that generates tokens one at a time in sequence, where each token depends on the previous ones, leading to slower generation for long sequences.

The Untaught Lessons of RAG Retrieval: Cosine Is Not the Foundation

A behind-the-scenes look at Midjourney’s medical scanner leaves many questions unanswered
