Google Bets On The Agentic AI Era With Its AI Hypercomputer, Merges 8th-Gen TPUs, NVIDIA Rubin, & Axion CPUs Together

•

Apr 22, 2026 at 09:10am EDT

Google Bets On The Agentic AI Era With Its AI Hypercomputer, Merges 8th-Gen TPUs, NVIDIA Rubin, & Axion CPUs Together

Google has announced the AI Hypercomputer, which brings together TPUv8 series, NVIDIA Rubin, & Axion CPUs to power the Agentic AI era.

Google Cloud Next 26: AI Hypercomputer Announcement Gives Agentic AI The Next Push, Leverages In-House TPUs, CPUs & Scales Beyond With NVIDIA Rubin

Gone are the days of supercomputers; the Agentic AI era will be all about hypercomputers, which will combine various compute options to deliver customers the most flexible and performant AI architecture ever built.

Today, at Google's Cloud Next 26 event, the company formally announced its AI Hypercomputer. The new high-performance computing datacenter for Agentic AI houses an advanced, purpose-built architecture that unifies performance-optimized hardware for compute, storage, networking, open software, and ML frameworks.

TPU 8t, optimized for training and TPU 8i, optimized for inference.

Looking good! pic.twitter.com/pTrblfF8mU
— Sundar Pichai (@sundarpichai) April 22, 2026

To make Google's AI Hypercomputer possible, the company had to go above and beyond. It will house its latest custom TPUv8 series, Axion Cloud CPUs, and will also deploy NVIDIA Rubin GPUs. Today's announcement also comes with the launch of Google's 8th Gen TPU lineup, which comes in two flavors: the TPU 8t and the TPU 8i.

Google TPU 8t - Training Chip

The Google TPU 8t chip is designed as a training powerhouse, reducing the deployment of frontier models from months to weeks. The chip offers the highest possible compute throughput, shared memory, and interchip bandwidth in the most power-efficient package ever built. The TPU 8t chip has a total FP4 compute capacity of 121 Exaflops per pod, 2.84x higher than Ironwood.

The key features include:

Massive scale: A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high-bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory.
Maximum utilization: By also integrating 10x faster storage access, combined with TPUDirect to pull data directly into the TPU, TPU 8t helps ensure maximum utilization of the end-to-end system.
Near-linear scaling: Our new Virgo Network, combined with JAX and our Pathways software, means TPU 8t can provide near-linear scaling for up to a million chips in a single logical cluster.
Native FP4: TPU 8t introduces native 4-bit floating point (FP4) to overcome memory bandwidth bottlenecks, doubling MXU throughput while maintaining accuracy for large models even at lower-precision quantization. By reducing the bits per parameter, the platform minimizes energy-intensive data movement and allows larger model layers to fit within local hardware buffers for peak compute utilization.

Google TPU 8i - Inferencing Chip

The second chip, TPU 8i, is designed for inference and pairs an incredible 288 GB of HBM memory with 384 MB of on-chip SRAM, which is a 3x boost in capacities over the previous generation. With such a large SRAM, you can keep models active entirely on the chip. The TPU 8i chip has a total FP8 compute capacity of 331.8 Exaflops per pod, 6.74x higher than Ironwood.

The salient features of TPU 8i include:

Axion-powered efficiency: We doubled the physical CPU hosts per server, moving to our custom Axion Arm-based CPUs. By using a non-uniform memory architecture (NUMA) for isolation, we have optimized the full system for superior performance.
Scaling MoE models: For modern Mixture of Experts (MoE) models, we doubled the Interconnect (ICI) bandwidth to 19.2 Tb/s. Our new Boardfly architecture reduces the maximum network diameter by more than 50%, ensuring the system works as one cohesive, low-latency unit.
Eliminating lag: Our new on-chip Collectives Acceleration Engine (CAE) offloads global operations, reducing on-chip latency by up to 5x, minimizing lag.

When it comes to generation over generation improvements, the TPU8t Training chip offers a 2.7x better performance per dollar improvement over Ironwood "TPUv7" in large-scale training, the TPU8i Inference chip offers a 80% performance per dollar improvement over Ironwood "TPUv7" in low-latency targets for MoE model. Both chips also deliver twice the performance per watt improvement, which is vital for AI TCO.

Both chips support Google's 4th Gen liquid cooling technology that is able to sustain the higher compute and performance densities, not possible with air cooling.

Feature	TPU 8t	TPU 8i
Primary Workload	Large-scale pre-training	Sampling, serving, and reasoning
Network Topology	3D torus	Boardfly
Specialized Chip Features	SparseCore (Embeddings) & LLM Decoder Engine	CAE (Collectives Acceleration Engine)
HBM Capacity	216 GB	288 GB
On-Chip SRAM (Vmem)	128 MB	384 MB
Peak FP4 PFLOPs	12.6	10.1
HBM Bandwidth	6,528 GB/s	8,601 GB/s (~1.3x of TPU 8t)
CPU Header	Arm Axion	Arm Axion

And with that, let's round up the main highlights of the Google AI Hypercomputer, which are listed below:

TPU 8t, optimized for training, uses breakthrough Inter-Chip Interconnect (ICI) technology to scale up to 9,600 TPUs and 2 PB of shared, high-bandwidth memory in a single superpod. It achieves 3x the processing power of Ironwood and delivers up to 2x more performance/Watt.
TPU 8i, optimized for inference, uses our new Boardfly topology to directly connect 1,152 TPUs in a single pod. It features 3x more on-chip SRAM compared to previous versions to host larger KV caches entirely on-silicon and integrates a specialized Collectives Acceleration Engine. Taken together, TPU 8i delivers 80% better performance per dollar for inference than the prior generation.
NVIDIA GPUs are a core part of our AI accelerator portfolio. We will be among the first to offer NVIDIA Vera Rubin NVL72, in addition to the Blackwell- and Hopper-based instances already available.
Google Cloud Axion includes our N4A Axion instances—launched in January—which deliver 100% better price-performance than comparable x86 instances, ensuring sustained, efficient operation for agents.
Network-optimized Compute is expanding with new C4N and M4N machine series, perfectly optimized for high-volume agent communication, network-heavy Telco 5G core, and enterprise databases. C4N instances can deliver an almost 4x increase in network bandwidth per vCPU over standard C4 instances.
Innovations in Storage include improvements in Managed Lustre, which now delivers 10 TB per second of throughput to A5X or TPU 8t over RDMA, supercharging training. In Rapid Storage, we’ve made significant performance improvements—going from 6 TB/sec to 15 TB/sec—to speed up AI training and inference workloads. And we are introducing Smart Storage, which applies semantic meaning to unstructured data, providing the foundation for our Enterprise Knowledge Graph.
New capabilities in Networking include the Virgo Network, our new, purpose-built, AI-optimized network that connects either NVIDIA Vera Rubin NVL72 systems or TPU 8t superpods into massive supercomputers with hundreds of thousands of accelerators, supercharging distributed training of the world's most capable frontier models.

Google Cloud will also be one of the first AI infrastructures to offer NVIDIA VR200 (Vera Rubin) accelerators. The Rubin GPUs will be paired with Google's brand new Virgo network, offering massive-scale training clusters alongside Google's own 8th Gen TPU family.

The Google AI hypercomputer will be used by several customers, including big names such as the US DOE, Boston Dynamics, Citadel Securities, Thinking Machine Labs, and Axia Energy.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Google Bets On The Agentic AI Era With Its AI Hypercomputer, Merges 8th-Gen TPUs, NVIDIA Rubin, & Axion CPUs Together

Google Cloud Next 26: AI Hypercomputer Announcement Gives Agentic AI The Next Push, Leverages In-House TPUs, CPUs & Scales Beyond With NVIDIA Rubin

Related Story Google Splits TPUv8 Strategy Into Two Chips, Handing Broadcom Training and MediaTek Inference Duties

Further Reading

Google Has Announced Its First Custom ARM-Based Chipset Called Axion, But It Will Not Power The Pixel 9 Or Pixel 9 Pro

Google Announces Its Next-Gen Cloud TPU v5p AI Accelerator Chips & AI Hypercomputer

Valve's Steam Machine Pre-Orders Open at $1049, but the Memory and Storage Crisis Pushed Prices Far Higher Than Hoped

JEDEC Approves SPHBM4 to Break HBM's Costly Packaging Bottleneck, Retaining HBM4-level Speeds With Standard Packages