NVIDIA Delivers Day-1 Support For DeepMind’s DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

Jun 10, 2026 at 04:15pm EDT
NVIDIA Delivers Day-1 Support For DeepMind's DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

NVIDIA's entire RTX/DGX lineups are getting full support for Google DeepMind's DiffusionGemma Open AI model.

Google Intros Its Newest Open AI Model: DiffusionGemma - NVIDIA Offers Full Support Across Its DGX & RTX Families

The DiffusionGemma model is an open model designed to offer speedy text generation, and with its launch, NVIDIA is announcing support across its RTX and DGX lineups. What's even better is that while DiffusionGemma is fast, NVIDIA's optimizations for the model and its hardware make it even faster.

Related Story Intel’s Project Firefly Takes The Best of Phones To Build Laptops In A Slim 12.9mm Metal Chassis With No Vents

The following are the main highlights of the model:

Model name DiffusionGemma 
Supported modalities Text, image 
Total parameters 25.2B 
Active parameters 3.8B  
Context length Up to 256K tokens 
Precision format BF16, NVFP4 

On NVIDIA's side, they are offering day-1 support across GeForce RTX GPUs, RTX PRO Platforms, and DGX systems ranging from Spark Mini PCs to workstations powered by their datacenter-grade chips. NVIDIA is utilizing its tensor core architecture and the CUDA software stack, offering robust support that requires no additional tuning.

NVIDIA has shared some stats too. The company states that its H100 Tensor Core GPUs on DGX Stations offer 1000 tokens/s (single GPU), DGX Spark systems offer 150 tokens/s, and DGX Station offers the fastest in-class local inference. The solutions offer roughly 4 times faster performance than an equivalent autoregressive model.

PlatformBest ForKey highlightsGetting started
NVIDIA DGX SparkPersonal AI supercomputer for local AI development, autonomous agents, AI research, and prototypingNVIDIA GB10 Grace Blackwell Superchip, 128 GB unified memory, 1 PFLOP of FP4 AI compute, and a preinstalled NVIDIA AI software stack for fully local OpenClaw workflowsDGX Spark playbooks for vLLM and Unsloth; deployment guides; NVIDIA NeMo Automodel fine-tuning guide; vLLM on DGX Spark guide
NVIDIA DGX StationDeskside AI supercomputer for building, running, and scaling AI workloadsNVIDIA GB300 Grace Blackwell Ultra Superchip, NVIDIA AI software stack, 748 GB coherent memory, up to 20 PFLOPS of FP4 compute, and support for models up to 1T parameters. Frontier AI development, inference, and agents at your desk.DGX Station playbooksvLLM on DGX Station guide
NVIDIA RTX + NVIDIA RTX PRODesktop AI apps, Windows development, and local inferenceOptimized local inference performance across desktop and workstation environments for creators and professionalsNVIDIA GB300 Grace Blackwell Ultra Superchip, NVIDIA AI software stack, 748 GB of coherent memory, up to 20 PFLOPS of FP4 compute, and support for models up to 1T parameters. Frontier AI development, inference, and agents at your desk.

Users who want to try out the DiffusionGemma model out of the box can do so right now on an RTX 5090 or DGX Spark system. NVIDIA offers a full-stack and ready-to-use framework to try out the model right now.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.