Hardware Hot Chips

Meta’s Catalina Pod AI System Couples NVIDIA’s Blackwell GB200 NVL72 With Open Rack v3, And Liquid Cooling

Hassan Mujtaba • Aug 25, 2025 at 02:00am EDT

Meta's Catalina data center server racks with extensive cabling and infrastructure.

Meta has shared the building blocks of its Catalina AI system, which is based on NVIDIA's GB200 NVL72 solution with Open Rack v3 & Liquid Cooling.

Meta's Custom NVIDIA GB200 NVL72 Blackwell Platform, The Catalina Pod, is Liquid Cooling-Ready & Open Rack v3 Compliant

Back in 2022, Meta mainly focused on clusters that were around 6,000 GPUs in terms of size. These were mainly designed for traditional ranking and recommendation models, so essentially running workloads that spanned 128-512 GPUs.

A year later, thanks to the advent of GenAI & LLMs, clusters grew to 16-24K GPUs (a 4x increase), and just last year, Meta was running 100,000 GPUs and continues to add more. Meta is also a software enabler with models such as LLama, and anticipates a 10x increase in cluster sizes by the next few years.

Meta states that they started on the Catalina project very early with NVIDIA, and utilize their NVL72 GPU solution as the baseline but while the name is same, it switches to a NVL36x2 configuration. Meta also worked with NVIDIA to customize the system to meet their needs, and both also contributed the reference design for MGX and NVL72 to open source, with Catalina being online on the Open Compute website.

So jumping into Meta's Catalina, this is what is being deployed by them in their data centers. Meta calls each system a pod, and they essentially copy/paste it for scale-up reasons.

Diagram of NVIDIA MGX GB200 system configuration with CPUs and NVLink connections.

Diagram of Meta Catalina GB200 configuration, featuring Grace CPU and NVLink connectivity.

One difference between the standard NVL72 versus Meta's custom version is that they have two IT racks that consist of a single 72 GPU scale-up domain. Each of these IT racks has the same configuration. They have 18 compute trays split between the top and the bottom of the rack. And they have nine NV switches within each IT rack on the left and the right. Between each system is a big, thick bundle of cables.

This is something that basically allows all of these GPUs across the two racks to be combined, connecting through the NV switches to create a single 72-GPU scale-up domain. On the left and right of the racks, you can see large ALCs, or air-assisted liquid cooling devices. These allow Meta to deploy liquid-cooled, high-power density racks into their existing data centers that are being deployed all over the US and the world.

Meta states that with two racks, they can essentially increase the number of CPUs and the amount of total memory within a rack, so going from 17 to 34 TB LPDDR memory, which helps them get all the way up to 48 TB of total cache-coherent memory that's between both the GPUs and the CPUs within a rack. The PSU takes 480 volts or 277 volts single-phase and converts it to 48 volts DC, which is distributed through the buck bar in the back, and that's what powers all of the individual server blades, NV switches, and networking devices within the rack.

High Power Rack deployment showcasing increased power capacity and new configurations.

Data center cooling systems with AALC racks and RMC for efficient liquid flow management.

So at the top and bottom of the rack, you can see there's one power supply shelf, and then two more at the bottom of each. Meta also has its own fiber path panel, which is what all of the in-rack fiber cabling is connected to for the back-end network, which then goes out to the data center to essentially connect to the networking switches that sit at the end of the row for the scale-up domain. There's the rack management controller, Wedge 400, which is a front-end network switch, and then there are several IT and switch trades.

To support all of this, Meta requires a range of new technologies, some of which are already a part of the NVIDIA NVL72 GB200 Blackwell system. Unique to Meta, there were a few things they have, like the high-power version of their open racks, essentially higher power supplies and CPUs. They also had liquid cooling, so the air-assisted liquid cooling needed to support those racks and traditional data centers. The rack management controller, which is basically a safety and orchestration device that helps enable and disable cooling, also monitors for leaks in the racks. They have their network topology, the disaggregated scheduled fabric, which is what allows them to connect multiple of these pods to make larger clusters.

Meta Board vs Nvidia GB200 Reference: Customization and implementation details comparison.

Close-up of PDB circuit board with labeled components and connections for electronics maintenance.

This is also the first deployment of Meta's high-powered rack version of OpenRack v3. This allows Meta to increase the amount of power for each rack up to 94 kW for the busbar (600A). This also supports newer buildings that have facility liquid cooling that actually lets you just run liquid straight to the rack. To manage liquid, Meta is using something called the RMC, or the Rack Management Controller. It sits within the rack, and it basically is constantly monitoring a number of different components within the rack for leaks. It's safely at the top of the rack here, essentially to make sure that if there is a leak, the leak doesn't happen to drip on it and shut it off. But it's what connects to the ALCs, helps them shut off, or connects to the valve train at the facility level, which basically shuts the valves off from the liquid coming in from the buildings that are at issue.

Meta is also using their own disaggregated scheduled fabric for Catalina. This allows them to connect multiple pods together within a single data center building or suite, and lets them connect multiple buildings together. And maybe even like go larger than that to basically provide these really large-scale clusters. It's tuned for AI and helps provide flexibility and speed. This is essentially how all the GPUs talk to each other.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Meta’s Catalina Pod AI System Couples NVIDIA’s Blackwell GB200 NVL72 With Open Rack v3, And Liquid Cooling

Meta’s Catalina Pod AI System Couples NVIDIA’s Blackwell GB200 NVL72 With Open Rack v3, And Liquid Cooling

Meta's Custom NVIDIA GB200 NVL72 Blackwell Platform, The Catalina Pod, is Liquid Cooling-Ready & Open Rack v3 Compliant

Trending Stories

Ubisoft Copies The Crimson Desert’s Playbook, As Assassin’s Creed Black Flag Resynced Ditches Roadmap For Community Feedback

M5’s Neural Accelerators Mark A Strategic Shift For The Future Of Apple Silicon Gaming; Non-Native Titles Obtain Major FPS Boost With Frame Generation On

PlayStation 6 Patent Scraps Liquid Metal Cooling After PS5 Leaks Fried APUs And Motherboards For Years

FromSoftware Finally Lifts The Veil Off The Duskbloods On August 21, As Network Test Registrations Open Soon

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

AMD Prepares For Zen 6 EPYC CPUs Launch For July 22nd-23rd, Confirms AMD’s Mark Papermaster

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

NVIDIA RTX 50 Series Hotspot Temperature Readings Are Back Through HWMonitor Utility

Meta’s Catalina Pod AI System Couples NVIDIA’s Blackwell GB200 NVL72 With Open Rack v3, And Liquid Cooling

Meta's Custom NVIDIA GB200 NVL72 Blackwell Platform, The Catalina Pod, is Liquid Cooling-Ready & Open Rack v3 Compliant

Related Story U.S. Officials Reportedly Confirm First NVIDIA H200 AI GPU Shipments To China Two Months After Trump-Xi Meeting

Further Reading

Trending Stories

Popular Discussions