NVIDIA Vera Rubin Enters Full Production, Ready To Bring The Full Force of NVIDIA’s AI Might To Agentic AI Factories

•

Jun 1, 2026 at 01:32am EDT

NVIDIA Confirms Vera Rubin Launch In Q3 With Volume Ramp by Q4, As Blackwell Continues To See Massive Demand

The world's most powerful Agentic AI platform, NVIDIA Vera Rubin, is now in full production and ready for deployment at AI factories.

NVIDIA Crushes All Vera Rubin Delay Rumors, Initiates Full Mass Production & Agentic AI Deployment

Less than two weeks ago, NVIDIA commenced the full production of its Vera CPUs, which are all set to open up a $200B TAM. With Vera, NVIDIA is confident enough that it will become the largest supplier of CPUs this year, but now, the whole platform, codenamed Vera Rubin NVL72, has entered full production and is aiming for deployment readiness to power Multi-Billion and Multi-Gigawatt AI factories around the globe.

Top system builders, infrastructure software and storage partners are in full-scale production of Vera Rubin. This includes Dell Technologies, HPE, Lenovo and Supermicro, as well as AIC, Aivres, ASRock Rack, ASUS, Cloudian, Compal, DDN, Everpure, Foxconn, GIGABYTE, Hitachi Vantara, Hyve Solutions, IBM, Inventec, MinIO, MiTAC Computing, MSI, NetApp, Nutanix, Pegatron, Quanta Cloud Technology (QCT), VAST Data, WEKA, Wistron and Wiwynn.

NVIDIA

NVIDIA's Rubin platform is going to be made up of a total of six chips, all of which are back from fabs and in NVIDIA's labs for testing. These chips include:

Rubin GPU (with 336 Billion Transistors)
Vera CPU (with 227 Billion Transistors)
NVLINK 6 Switch for Interconnect
CX9 & BF4 for Networking
Spectrum-X 102.4T CPO for silicon photonics

All of these chips combined make the Rubin platform alive inside a range of DGX, HGX, and MGX systems. At the heart of each data center is the NVIDIA Vera Rubin Superchip, featuring two Rubin GPUs, one Vera CPU, and massive amounts of memory in HBM4 and LPDDR5x configurations. The highlights of the NVIDIA Rubin technology include:

6th Gen NVLink (3.6 TB/s Scale-Up)
Vera CPU (Custom Olympus Core)
Rubin CPU (50 PF NVFP4 Transformer Engine)
3rd Gen Confidential Computing (First Rack-Scale TEE)
2nd Gen RAS Engine (Zero Downtime Health Checks)

So starting with the Rubin GPU, this chip features two reticle dies, each with lots of compute and tensor cores. The chip itself is designed purely for AI-intensive workloads, offering 50 FLOPs of NVFP4 Inference, 35 PFLOPs of NVFP4 Training performance, a 5x and 3.5x increase over Blackwell, respectively. The chip is also equipped with HBM4 memory, offering up to 22 TB/s bandwidth per chip, a 2.8x increase vs Blackwell and 3.6 TB/s of NVLink bandwidth per CPU, a 2x increase vs Blackwell.

For the Vera CPU, NVIDIA has designed its next-gen custom Arm architecture codenamed Olympus, and the chip packs 88 cores, 176 threads (with NVIDIA Spatial Multi-Threading), 1.8 TB/s NVLink-C2C coherent memory interconnect, 1.5 TB of system memory (3x Grace), 1.2 TB/s of memory bandwidth with SOCAMM LPDDR5X, and Rack-scale confidential compute. These combine to offer 2x data processing, compression & CI/CD performance versus Grace.

NVLink 6 switches offer networking fabric on the Rubin platform with 400G SerDes, 3.6 TB/s per-CPU all-to-all bandwidth, 28.8 TB/s of total bandwidth, 14.4 TFLOPS of FP8 compute in-network, & a 100% liquid cooled design.

Networking is powered by the latest ConnectX-9 and BlueField-4 modules. ConnectX-9 SuperNIC offers 1.6 TB/s bandwidth with 200G PAM4 SerDes, programmable RDMA and data path accelerator, top-level security, and is optimized for massive-scale AI.

The Bluefield-4 is an 800G DPU for SmartNIC and storage processor. It integrates a 64-core Grace CPU with ConnectX-9, offers 2x networking capabilities versus BlueField-3, 6x compute, and 3x memory bandwidth.

All of these come together in the NVIDIA Vera Rubin NVL72 rack, which offers some impressive uplifts versus Blackwell as detailed below:

5x NVFP4 Inference (3.6 EFLOPS)
3.5x NVFP4 Training (2.5 EFLOPS)
2.5x LPDDR5x Capacity (54 TB)
1.5x HBM4 Capacity (20.7 TB)
2.8x HBM4 Bandwidth (1.6 PB/s)
2x Scale-Up Bandwidth (260 TB/s)

NVIDIA is also announcing its Spectrum-X Ethernet Co-Packaged Optics solution, which offers a 102.4 Tb/s scale-out switch infrastructure, co-packaged 200G silicon photonics, and offers 95% of effective bandwidth at scale. The system is 5 times more efficient, 10 times more reliable, and offers 5 times higher application runtime.

For its Rubin SuperPOD, NVIDIA is also unveiling the Inference Context Memory Storage platform, which is built for gigascale inference and is fully integrated with NVIDIA software solutions such as Dynamo, NIXL & DOCA.

To wrap it all up, NVIDIA will be putting its Rubin platform in its bleeding-edge DGX SuperPOD with 8 Vera Rubin NVL72 racks. But that isn't it, there's also the NVIDIA DGX Rubin NVL8 for mainstream Data Centers.

With all of these advancements, NVIDIA Rubin offers a 10x reduction in inference token cost and a 4x reduction in the number of GPUs to train MoE models vs Blackwell GB200. The Rubin ecosystem is backed by a diverse range of partners and is in full production, with customers getting the first chips later this year.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

NVIDIA Vera Rubin Enters Full Production, Ready To Bring The Full Force of NVIDIA’s AI Might To Agentic AI Factories

NVIDIA Crushes All Vera Rubin Delay Rumors, Initiates Full Mass Production & Agentic AI Deployment

Related Story Korean AI Startup, Furiosa AI, Is Doubling Its Chip Production To 50,000 Units Next Year While Its Upcoming 2nm “Stork” Chip Challenges NVIDIA With The “World’s Best Inference”

Further Reading

NVIDIA GPU Hotspot Temperature Has Been Unlocked Through Mods, & Shows Widespread Thermal Issues Affecting RTX 50 GPUs That Throttle Gaming Performance

NVIDIA's Rubin Ultra Rack Estimated To Cost $21 Million, With HBM4e Memory Alone Swelling To $1.5 Million Per Unit

DeepSeek Is Reportedly Building Its Own Inference Chip to Break Free From Both NVIDIA and Huawei

Perplexity Bets on NVIDIA's Vera CPU, Calling The Max Single-Threaded Chip a "Dead-On" Fit After It Ran 1.5x Faster in Agentic Coding