AI Hardware

NVIDIA Delivers Day-1 Support For DeepMind’s DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

Hassan Mujtaba • Jun 10, 2026 at 04:15pm EDT

NVIDIA's entire RTX/DGX lineups are getting full support for Google DeepMind's DiffusionGemma Open AI model.

Google Intros Its Newest Open AI Model: DiffusionGemma - NVIDIA Offers Full Support Across Its DGX & RTX Families

The DiffusionGemma model is an open model designed to offer speedy text generation, and with its launch, NVIDIA is announcing support across its RTX and DGX lineups. What's even better is that while DiffusionGemma is fast, NVIDIA's optimizations for the model and its hardware make it even faster.

Congrats to @GoogleDeepMind on the launch of DiffusionGemma.

The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100.

We're supporting it from day one with:
• BF16 and NVFP4 checkpoints on @huggingface🤗
• Free… https://t.co/0xqMXKvMQV
— NVIDIA AI (@NVIDIAAI) June 10, 2026

The following are the main highlights of the model:

Parallel generation: DiffusionGemma denoises up to 256 tokens per step instead of predicting one at a time.
Built on Gemma 4: DiffusionGemma is built on Gemma 4, a 26-billion-parameter mixture-of-experts model that activates just 3.8 billion parameters per step, pairing a diffusion head with Google’s Gemma 4 architecture.
Up to 4x faster performance: The boost means fast text generation, where single-user generation usually stalls — on local hardware.
Open and local: DiffusionGemma is open-weight under a permissive Apache 2.0 license and runs entirely on RTX and DGX Spark — no cloud, no per-token cost — with day-zero support in Hugging Face Transformers, vLLM and Unsloth.

Model name	DiffusionGemma
Supported modalities	Text, image
Total parameters	25.2B
Active parameters	3.8B
Context length	Up to 256K tokens
Precision format	BF16, NVFP4

On NVIDIA's side, they are offering day-1 support across GeForce RTX GPUs, RTX PRO Platforms, and DGX systems ranging from Spark Mini PCs to workstations powered by their datacenter-grade chips. NVIDIA is utilizing its tensor core architecture and the CUDA software stack, offering robust support that requires no additional tuning.

NVIDIA has shared some stats too. The company states that its H100 Tensor Core GPUs on DGX Stations offer 1000 tokens/s (single GPU), DGX Spark systems offer 150 tokens/s, and DGX Station offers the fastest in-class local inference. The solutions offer roughly 4 times faster performance than an equivalent autoregressive model.

Locally on the NVIDIA DGX Spark deskside personal AI supercomputer — powered by the NVIDIA GB10 Grace Blackwell Superchip with 128GB of unified memory — with the preinstalled NVIDIA AI software stack ready for prototyping, fine-tuning and fully local agent workflows.
On NVIDIA RTX PRO 6000 workstations, providing developers, researchers, and AI professionals with the headroom to run local low-latency generation and agentic loops as part of a professional workflow.
On DGX Station, delivering best-in-class, high-speed inference at up to 800 tokens/sec for low-latency text generation and agentic loops with 748GB of coherent memory.
On GeForce RTX GPUs, with llama.cpp support coming soon.

Platform	Best For	Key highlights	Getting started
NVIDIA DGX Spark	Personal AI supercomputer for local AI development, autonomous agents, AI research, and prototyping	NVIDIA GB10 Grace Blackwell Superchip, 128 GB unified memory, 1 PFLOP of FP4 AI compute, and a preinstalled NVIDIA AI software stack for fully local OpenClaw workflows	DGX Spark playbooks for vLLM and Unsloth; deployment guides; NVIDIA NeMo Automodel fine-tuning guide; vLLM on DGX Spark guide
NVIDIA DGX Station	Deskside AI supercomputer for building, running, and scaling AI workloads	NVIDIA GB300 Grace Blackwell Ultra Superchip, NVIDIA AI software stack, 748 GB coherent memory, up to 20 PFLOPS of FP4 compute, and support for models up to 1T parameters. Frontier AI development, inference, and agents at your desk.	DGX Station playbooks; vLLM on DGX Station guide
NVIDIA RTX + NVIDIA RTX PRO	Desktop AI apps, Windows development, and local inference	Optimized local inference performance across desktop and workstation environments for creators and professionals	NVIDIA GB300 Grace Blackwell Ultra Superchip, NVIDIA AI software stack, 748 GB of coherent memory, up to 20 PFLOPS of FP4 compute, and support for models up to 1T parameters. Frontier AI development, inference, and agents at your desk.

Users who want to try out the DiffusionGemma model out of the box can do so right now on an RTX 5090 or DGX Spark system. NVIDIA offers a full-stack and ready-to-use framework to try out the model right now.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Delivers Day-1 Support For DeepMind’s DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

NVIDIA Delivers Day-1 Support For DeepMind’s DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

Google Intros Its Newest Open AI Model: DiffusionGemma - NVIDIA Offers Full Support Across Its DGX & RTX Families

Trending Stories

Square Enix’s Final Fantasy VII Rebirth Shader Injector Created A 2026 PC Remaster, Yet Procedural Skyboxes Could Push It Further

Microsoft Looking To Save As Much As $600 Million By Swapping GPT And Claude For China’s Kimi K3 In Copilot, Risking A Rap On The Knuckles From The Trump Administration

A Modder Fits Entire Grand Theft Auto PS2 Trilogy Inside a Single Game, While Rockstar Continues to Prepare GTA 6

Kirin 9030 In-Depth Analysis Proves SMIC Can Create Denser SoCs Than Intel Has With Its 18A Node, But The Attributes That Require Improvements Are Left Out

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

NVIDIA Delivers Day-1 Support For DeepMind’s DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

Google Intros Its Newest Open AI Model: DiffusionGemma - NVIDIA Offers Full Support Across Its DGX & RTX Families

Related Story CD Projekt Red Sets The Witcher 3: Songs of the Past August 25 Reveal, but Switch 2 Fans Are Fuming

Further Reading

Trending Stories

Popular Discussions