Analysis Hardware Report

Intel Showcases Its Packaging Prowess With 7nm Ponte Vecchio Xe-HPC GPU, Over 100 Billion Transistors & 47 XPU Compute Tiles

Hassan Mujtaba • Mar 24, 2021 at 07:07am EDT

Intel representative teases the new Ponte Vecchio compute GPU for AI & HPC applications of the future

Yesterday, during the Intel unleashed webcast, CEO, Pat Gelsinger, unveiled new details of the 7nm Xe-HPC-based Ponte Vecchio GPU which is planned to be the largest and most chip designed to date. The Ponte Vecchio GPU will be making use of several key technologies that were highlighted which will power 47 different compute tiles based on different process nodes and architectures.

Intel 7nm Xe-HPC Powered Ponte Vecchio GPU Further Detailed - Over 100 Billion Transistors, 47 XPU Tiles & Mix-Match of Various Process Nodes

The Intel Ponte Vecchio GPU is first and foremost based on the Xe-HPC graphics architecture which is the flagship product leveraging Intel's 7nm EUV process node. But aside from that, the chip has a ton of other compute tiles that are based on different process nodes, all of which merge into one singular exascale graphics processing unit known as Ponte Vecchio. We already gave a run-down of what the complete Ponte Vecchio GPU has to offer and you can read a more detailed post on that here.

So for starters, while the GPU primarily makes use of Intel's 7nm EUV process node but Intel will also be producing some Xe-HPC compute dies through external fabs (such as TSMC), There are other tiles that are essential for the Ponte Vecchio GPU to work and those are fabricated on TSMC's 7nm process node. We cannot confirm yet if Intel will be leveraging TSMC's standard 7nm or 7nm+ EUV process node but it is likely that Intel could go the more standard route since the Xe Link I/O tile that will be using TSMC's process can do the job while being based on the non-EUV 7nm process.

Raja teased that there are 7 advanced technologies at play here, and by our calculation, these would be:

Intel 7nm
TSMC 7nm
Foveros 3D Packaging
EMIB
10nm Enhanced Super Fin
Rambo Cache
HBM2

Following is how Intel gets to 47 tiles on the Ponte Vecchio chip:

16 Xe HPC (internal/external)
8 Rambo (internal)
2 Xe Base (internal)
11 EMIB (internal)
2 Xe Link (external)
8 HBM (external)

The Ponte Vecchio chip is actually composed of two separate GPU dies, each consisting of six Xe-HPC Compute units. A pair of these Compute Units are directly attached to a Rambo Cache which utilizes the Intel 10nm Enhanced SuperFin process node. Each GPU block is also attached to four HBM2 stacks which could either be 4-hi or 8-hi. There are eight HBM2 stacks in total that will offer multi-GBs of memory capacity with loads of bandwidth. There are also 8 passive die stiffeners on each GPU. The main GPU makes use of Foveros 3D packaging to connect the GPU compute units with the cache while the EMIB interconnects the HBM2 and Xe Link I/O tile with the main GPU. The GPU also makes use of 11 EMIB dies that are featured underneath the HBM2 and I/O link chips.

In general, Forveros offers inter-GPU connectivity (GPU + Cache) within the same tiles while EMIB offers connectivity for off-die tiles (HBM2 with GPU). This all cumulates to form the Ponte Vecchio Xe-HPC GPU which is composed of over 100 Billion transistors. An interesting lego block diagram was posted by Raja Koduri which shows the various blocks/tiles of the Ponte Vecchio GPU but we also have the more detailed block diagram posted above which provides you an exact illustration of what each tile is.

Like this? https://t.co/6OVEssFppl pic.twitter.com/dG8nm58hLz

— Raja Koduri (@RajaXg) March 24, 2021

Andreas, in Jan we didn't account the HBM's as individual tiles. That's the main difference.
16 Xe HPC (internal/external)
8 Rambo (internal)
2 Xe Base (internal)
11 EMIB (internal)
2 Xe Link (external)
8 HBM (external) https://t.co/uA0jAs8QDo

— Raja Koduri (@RajaXg) March 24, 2021

Intel Xe HPC 'Ponte Vecchio' GPU - What We Know So Far

So rounding up the details, the Intel Xe HPC 'Ponte Vecchio' GPUs will be the lead 7nm product arriving in 2021. it will feature an MCM package design based on the Foveros 3D packaging technology. Each MCM GPU will be connected to high-density HBM DRAM packages through EMIB & will additionally feature a faster Rambo Cache close to them which will be connected through Foveros. Finally, while Slingshot provides an interconnect between the nodes, Intel's Xe Link will be interconnecting the 6 Xe HPC GPUs together.

Intel has previously detailed that its Xe HPC GPUs will feature 1000s of EUs. So far, we have only seen Xe LP with 96 EUs which makes up for a total of 768 cores. Currently, Intel features 8 EUs per subslice. A subslice within a Gen 12 GPU is similar to the NVIDIA SM unit inside the GPC or an AMD CU within the Shader Engine. Intel currently features 8 EUs per subslice on its Gen 9.5 and Gen 11 GPUs so if the same hierarchy is kept, we can see a significant amount of Super-Slices consisting of many subslices. Each Gen 11 and Gen 9.5 EU also contain 8 ALUs which will remain the same on Gen 12 too from the looks of it.

intel-aurora-supercomputer_xe-hpc-ponte-vecchio-7nm-gpu_sapphire-rapids-xeon-10nm-cpus_6

intel-aurora-supercomputer_xe-hpc-ponte-vecchio-7nm-gpu_sapphire-rapids-xeon-10nm-cpus_7

Rounding it up, A 1000 EU chip will make up for 8000 cores but it has been confirmed that 1000 is just the base value and the actual core count is much bigger than that. A 4-tile Xe HP GPU with 2048 EUs or 16,384 cores has already been detailed so it's likely that HPC parts will be much bigger than that. Here are the actual EU counts of Intel's various MCM-based Xe HP GPUs along with estimated core counts and TFLOPs:

Intel Xe HP (12.5) 1-Tile GPU: 512 EU [Est: 4096 Cores, 12.2 TFLOPs assuming 1.5GHz, 150W]
Intel Xe HP (12.5) 2-Tile GPU: 1024 EUs [Est: 8192 Cores, 20.48 assuming 1.25 GHz, TFLOPs, 300W]
Intel Xe HP (12.5) 4-Tile GPU: 2048 EUs [Est: 16,384 Cores, 36 TFLOPs assuming 1.1 GHz, 400W/500W]

Intel Xe class GPUs would feature variable vector width as mentioned below:

SIMT (GPU Style)
SIMD (CPU Style)
SIMT + SIMD (Max Performance)

Raja specifically talked about the Xe HPC class GPUs since that's what the developer conference is entirely about. Intel's Xe HPC GPUs would be able to scale to 1000s of EUs and each Execution unit has been upgraded to deliver 40 times better double-precision floating-point compute horsepower.

intel-xe-hpc-gpu_ponte-vecchio_architecture_raja-koduri_1

intel-xe-hpc-gpu_ponte-vecchio_architecture_raja-koduri_2

The EU's would be connected with a new scalable memory fabric known as XEMF (short form of XE Memory Fabric) to several high-bandwidth memory channels. The Xe HPC architecture would also include a very large unified cache known as Rambo cache which would connect several GPUs together. This Rambo cache would offer a sustainable peak FP64 compute perf throughout double-precision workloads by delivering huge memory bandwidth.

Just in terms of process optimizations, the following are the few key improvements that Intel has announced for their 7nm process node over 10nm:

2x density scaling vs 10nm
Planned intra-node optimizations
4x reduction in design rules
EUV
Next-Gen Foveros & EMIB Packaging

The Xe HPC GPUs would be using Forveros technology to interconnect with the Rambo cache which would be shared across several other Xe HPC GPUs on the same interposer. Just like their Xeon brethren, Intel's Xe HPC GPUs would come with ECC memory/cache correction and Xeon-Class RAS. Intel's Ponte Vecchio GPUs will be heading out first to the Aurora supercomputer with shipments beginning later this year. The GPU will compete against NVIDIA's Ada Lovelace and AMD's CDNA 2 graphics architectures in the HPC segment which is also going to be utilizing a multi-die design approach.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Intel Showcases Its Packaging Prowess With 7nm Ponte Vecchio Xe-HPC GPU, Over 100 Billion Transistors & 47 XPU Compute Tiles

Intel Showcases Its Packaging Prowess With 7nm Ponte Vecchio Xe-HPC GPU, Over 100 Billion Transistors & 47 XPU Compute Tiles

Intel 7nm Xe-HPC Powered Ponte Vecchio GPU Further Detailed - Over 100 Billion Transistors, 47 XPU Tiles & Mix-Match of Various Process Nodes

Trending Stories

NVIDIA DLSS 5 Hands Over Full Control To Artists To “Direct The Final Frame”, As SIGGRAPH Technical Demo Shows How Neural Rendering Solved Big Challenge To Achieve 4K “Life-Like” Visuals On A Single GPU

A Modder Fits Entire Grand Theft Auto PS2 Trilogy Inside a Single Game, While Rockstar Continues to Prepare GTA 6

Square Enix’s Final Fantasy VII Rebirth Shader Injector Created A 2026 PC Remaster, Yet Procedural Skyboxes Could Push It Further

Kirin 9030 In-Depth Analysis Proves SMIC Can Create Denser SoCs Than Intel Has With Its 18A Node, But The Attributes That Require Improvements Are Left Out

NVIDIA Rubin GPUs Bring 10x Increase in Agentic AI Performance Versus Blackwell as Its Architecture Gets Fully Unpacked, Featuring 336 billion Transistors

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

Intel Showcases Its Packaging Prowess With 7nm Ponte Vecchio Xe-HPC GPU, Over 100 Billion Transistors & 47 XPU Compute Tiles

Intel 7nm Xe-HPC Powered Ponte Vecchio GPU Further Detailed - Over 100 Billion Transistors, 47 XPU Tiles & Mix-Match of Various Process Nodes

Related Story Intel Xeon 6 Leaps To 8000 MT/s Memory Now, But The Real Payoff Waits For 8800 MT/s MRDIMM In 2027

Further Reading

Trending Stories

Popular Discussions