NVIDIA Delivers Day-1 Support For DeepMind’s DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

•

Jun 10, 2026 at 04:15pm EDT

NVIDIA Delivers Day-1 Support For DeepMind's DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

NVIDIA's entire RTX/DGX lineups are getting full support for Google DeepMind's DiffusionGemma Open AI model.

Google Intros Its Newest Open AI Model: DiffusionGemma - NVIDIA Offers Full Support Across Its DGX & RTX Families

The DiffusionGemma model is an open model designed to offer speedy text generation, and with its launch, NVIDIA is announcing support across its RTX and DGX lineups. What's even better is that while DiffusionGemma is fast, NVIDIA's optimizations for the model and its hardware make it even faster.

Congrats to @GoogleDeepMind on the launch of DiffusionGemma.

The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100.

We're supporting it from day one with:
• BF16 and NVFP4 checkpoints on @huggingface🤗
• Free… https://t.co/0xqMXKvMQV
— NVIDIA AI (@NVIDIAAI) June 10, 2026

The following are the main highlights of the model:

Parallel generation: DiffusionGemma denoises up to 256 tokens per step instead of predicting one at a time.
Built on Gemma 4: DiffusionGemma is built on Gemma 4, a 26-billion-parameter mixture-of-experts model that activates just 3.8 billion parameters per step, pairing a diffusion head with Google’s Gemma 4 architecture.
Up to 4x faster performance: The boost means fast text generation, where single-user generation usually stalls — on local hardware.
Open and local: DiffusionGemma is open-weight under a permissive Apache 2.0 license and runs entirely on RTX and DGX Spark — no cloud, no per-token cost — with day-zero support in Hugging Face Transformers, vLLM and Unsloth.

Model name	DiffusionGemma
Supported modalities	Text, image
Total parameters	25.2B
Active parameters	3.8B
Context length	Up to 256K tokens
Precision format	BF16, NVFP4

On NVIDIA's side, they are offering day-1 support across GeForce RTX GPUs, RTX PRO Platforms, and DGX systems ranging from Spark Mini PCs to workstations powered by their datacenter-grade chips. NVIDIA is utilizing its tensor core architecture and the CUDA software stack, offering robust support that requires no additional tuning.

NVIDIA has shared some stats too. The company states that its H100 Tensor Core GPUs on DGX Stations offer 1000 tokens/s (single GPU), DGX Spark systems offer 150 tokens/s, and DGX Station offers the fastest in-class local inference. The solutions offer roughly 4 times faster performance than an equivalent autoregressive model.

Locally on the NVIDIA DGX Spark deskside personal AI supercomputer — powered by the NVIDIA GB10 Grace Blackwell Superchip with 128GB of unified memory — with the preinstalled NVIDIA AI software stack ready for prototyping, fine-tuning and fully local agent workflows.
On NVIDIA RTX PRO 6000 workstations, providing developers, researchers, and AI professionals with the headroom to run local low-latency generation and agentic loops as part of a professional workflow.
On DGX Station, delivering best-in-class, high-speed inference at up to 800 tokens/sec for low-latency text generation and agentic loops with 748GB of coherent memory.
On GeForce RTX GPUs, with llama.cpp support coming soon.

Platform	Best For	Key highlights	Getting started
NVIDIA DGX Spark	Personal AI supercomputer for local AI development, autonomous agents, AI research, and prototyping	NVIDIA GB10 Grace Blackwell Superchip, 128 GB unified memory, 1 PFLOP of FP4 AI compute, and a preinstalled NVIDIA AI software stack for fully local OpenClaw workflows	DGX Spark playbooks for vLLM and Unsloth; deployment guides; NVIDIA NeMo Automodel fine-tuning guide; vLLM on DGX Spark guide
NVIDIA DGX Station	Deskside AI supercomputer for building, running, and scaling AI workloads	NVIDIA GB300 Grace Blackwell Ultra Superchip, NVIDIA AI software stack, 748 GB coherent memory, up to 20 PFLOPS of FP4 compute, and support for models up to 1T parameters. Frontier AI development, inference, and agents at your desk.	DGX Station playbooks; vLLM on DGX Station guide
NVIDIA RTX + NVIDIA RTX PRO	Desktop AI apps, Windows development, and local inference	Optimized local inference performance across desktop and workstation environments for creators and professionals	NVIDIA GB300 Grace Blackwell Ultra Superchip, NVIDIA AI software stack, 748 GB of coherent memory, up to 20 PFLOPS of FP4 compute, and support for models up to 1T parameters. Frontier AI development, inference, and agents at your desk.

Users who want to try out the DiffusionGemma model out of the box can do so right now on an RTX 5090 or DGX Spark system. NVIDIA offers a full-stack and ready-to-use framework to try out the model right now.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

NVIDIA Delivers Day-1 Support For DeepMind’s DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

Google Intros Its Newest Open AI Model: DiffusionGemma - NVIDIA Offers Full Support Across Its DGX & RTX Families

Related Story Intel’s Project Firefly Takes The Best of Phones To Build Laptops In A Slim 12.9mm Metal Chassis With No Vents

Further Reading

Apple's New CoreAI Engine Barely Edges Out Its Own MLX Framework At Realistic 8B Model Sizes, Despite Being 2.47x Faster On Tiny Models

Playground Games Debuts 30min of Fable Gameplay, Showing Off What Players can Expect as they Explore Albion

Resident Evil Veronica Producer Quashes Down First-Person Fears, Promises Claire's Remake Will Mirror RE2's Survival Loop

Siri AI Is Great For Apple To Maintain Competitiveness, But The Numbers Show That It Won’t Break Your Purchasing Decision