Microsoft Azure Gets An Ultra Upgrade With NVIDIA’s GB300 “Blackwell Ultra” GPUs, 4600 GPUs Connected Together To Run Over Trillion Parameter AI Models

Oct 10, 2025 at 05:40am EDT
Microsoft Azure Gets An Ultra Upgrade With NVIDIA's GB300 "Blackwell Ultra" GPUs, 4600 GPUs Connected Together To Run Over Trillion Parameter AI Models 1

Microsoft has announced its first at-scale production cluster featuring NVIDIA's GB300 "Blackwell Ultra" GPUs for massively-sized AI models.

NVIDIA GB300 "Blackwell Ultra" Crunches Through 100s of Trillions of Parameters AI Models Within Microsoft's latest Azure Platform

Microsoft's Azure has received the Blackwell Ultra upgrade. The latest large-scale and production cluster has been integrated with over 4,600 GPUs based on NVIDIA's GB300 NVL72 architecture, all connected using the next-generation InfiniBand interconnect fabric. This deployment paves the way for Microsoft to scale to hundreds of thousands of Blackwell Ultra GPUs deployed across various datacenters across the globe, and all tackling one workload, AI.

Related Story New ROG Xbox Ally Update Adds “Hand-Crafted” Game Profiles For 40 Games

According to Microsoft, the Azure cluster with NVIDIA GB300 NVL72 "Blackwell Ultra" GPUs can reduce training times from months to weeks, and unlocks the way for training models that are over 100s of trillions of parameters large. NVIDIA also leads the way with its strong Inference performance that has been demonstrated countless times in MLPerf benchmarks, and also the most recent InferenceMAX AI tests.

The new Microsoft Azure ND GB300 v6 VMs are optimized for reasoning models, agentic AI systems, and multimodal generative AI workloads. Each rack features a total of 18 VMs with 72 GPUs each. The following are the main specs highlights:

At the rack level, NVLink and NVSwitch reduce memory and bandwidth constraints, enabling up to 130TB per second of intra-rack data-transfer connecting 37TB total of fast memory. Each rack becomes a tightly coupled unit, delivering higher inference throughput at reduced latencies on larger models and longer context windows, empowering agentic and multimodal AI systems to be more responsive and scalable than ever.

To scale beyond the rack, Azure deploys a full fat-tree, non-blocking architecture using NVIDIA Quantum-X800 Gbp/s InfiniBand, the fastest networking fabric available today. This ensures that customers can scale up training of ultra-large models efficiently to tens of thousands of GPUs with minimal communication overhead, thus delivering better end-to-end training throughput. Reduced synchronization overhead also translates to maximum utilization of GPUs, which helps researchers iterate faster and at lower costs despite the compute-hungry nature of AI training workloads. Azure’s co-engineered stack, including custom protocols, collective libraries, and in-network computing, ensures the network is highly reliable and fully utilized by the applications. Features like NVIDIA SHARP accelerate collective operations and double effective bandwidth by performing math in the switch, making large-scale training and inference more efficient and reliable.

Azure’s advanced cooling systems use standalone heat exchanger units and facility cooling to minimize water usage while maintaining thermal stability for dense, high-performance clusters like GB300 NVL72. We also continue to develop and deploy new power distribution models capable of supporting the high energy density and dynamic load balancing required by the ND GB300 v6 VM class of GPU clusters.

via Microsoft

As per NVIDIA, the Microsoft Azure partnership marks a leadership moment for the United States in leading the AI race. The newest Azure VMs are now deployed and ready for use by customers.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day