Intel Xe GPU Architecture Detailed – Ponte Vecchio Xe HPC Exascale GPU With 1000s of EUs, Massive HBM Memory, Rambo Cache

Hassan Mujtaba • Nov 17, 2019 at 07:36pm EST

Intel has just unveiled the latest details of their Xe GPU architecture based products at its HPC Developer Conference. Talking at the stage, Intel's SVP, Chief Architect and General Manager of Architecture, Raja Koduri, revealed the very first architecture roadmap for Intel's first in-house graphics architecture known as Xe and the respective products lines that it would be embedded within.

Intel Details Xe GPU Architecture - Ponte Vecchio For Exascale Compute Scalable To 1000s of EUs, XEMF Scalable Memory Fabric, Rambo Cache, Forveros Packaging, 40X Increase In FP64 Compute Per EU & A lot More!

There's much to cover here so let's talk about the first aspect of the Xe GPU architecture, the lineup itself. The Intel Xe GPU architecture is one scalable architecture powering various products. Intel is planning to offer three microarchitectures derived from Xe. These include:

Intel Xe LP (Integrated + Entry)
Intel Xe HP (Mid-Range, Enthusiasts, Datacenter / AI)
Intel Xe HPC (HPC Exascale)

Just from the naming scheme, you can tell where these GPUs would be a feature. The 'LP' keyword stands for Low-Power whereas te 'HP' keyword stands for High-Performance. The HPC keyword is simply the High-Performance Computing aimed architecture which would use a range of new Intel technologies that we are going to talk about. It is stated that Xe LP is around 5W-20W but can scale up to 50W. Intel's Xe HP is one tier above that and should cover the 75W-250W segment while the Xe HPC class architecture should aim even higher, delivering, even more, compute performance than the rest.

“Architecture is a software compatibility contract. We originally were planning for two microarchitectures within Xe, our architecture (LP and HP), but we saw an opportunity for a third within HPC.” - Raja Koduri

Intel Xe class GPUs would feature variable vector width as mentioned below:

SIMT (GPU Style)
SIMD (CPU Style)
SIMT + SIMD (Max Performance)

Raja specifically talked about the Xe HPC class GPUs since that's what the developer conference is entirely about. Intel's Xe HPC GPUs would be able to scale to 1000s of EUs and each Execution unit has been upgraded to deliver 40 times better double-precision floating-point compute horsepower.

The EU's would be connected with a new scalable memory fabric known as XEMF (short form of XE Memory Fabric) to several high-bandwidth memory channels. The Xe HPC architecture would also include a very large unified cache known as Rambo cache which would connect several GPUs together. This Rambo cache would offer a sustainable peak FP64 compute perf throughout double-precision workloads by delivering huge memory bandwidth.

“At the heart of Xe architecture we have a new fabric called XEMF. It is the heart of the performance of these machines. We called it the Rambo Cache. It is a unified cache that is accessible to CPU and GPU memory.” - Raja Koduri

Intel will be manufacturing their Xe HPC class GPUs on the latest 7nm process node. This is also the lead 7nm product that Intel has talked about previously. Intel would make full use of their new and enhanced packaging technologies such as Forveros and EMIB interconnects to develop the next exascale GPUs. Just in terms of process optimizations, following are the few key improvements that Intel has announced for their 7nm process node over 10nm:

2x density scaling vs 10nm
Planned intra-node optimizations
4x reduction in design rules
EUV
Next-Gen Foveros & EMIB Packaging

The Xe HPC GPUs would be using Forveros technology to interconnect with the Rambo cache which would be shared across several other Xe HPC GPUs on the same interposer. Similarly, EMIB would be used to connect the HBM memory with the GPUs. Both technologies would deliver a huge leap in bandwidth efficiency and density. Just like their Xeon brethren, Intel's Xe HPC GPUs would come with ECC memory/cache correction and Xeon-Class RAS.

Blue Team's First HPC GPU, The 7nm Ponte Vecchio - Landing in The Aurora Supercomputer in 2021

With all the key technologies detailed, let's get straight to the first 7nm product in which Intel's Xe HPC architecture is going to be featured. It is called Ponte Vecchio, a supermassive GPU that aims to be the next single-chip exascale design for supercomputers. The Ponte Vecchio GPU would come with 16 compute chiplets which are based on the Xe HPC GPU architecture.

There seem to be massive amounts of HBM DRAM connected to each GPU. A singular node for the Aurora Supercomputer is also detailed here. We are looking at six Ponte Vecchio GPUs connected via the Intel using CXL (Compute Express Link or Intel Xe Link) with a OneAPI software stack. The node would also feature 2 Intel Sapphire Rapids processors which are based on the next-gen 10nm++ Willow Cove CPU architecture. The first confirmed product to feature the 7nm datacenter Xe based Ponte Vecchio PGPUs will be the Aurora supercomputer as detailed above. Some key features of a singular Aurora supercomputer node include:

Leadership Performance (For HPC, Data Analytics, AI)
Unified Memory Architecture (Across CPU & GPU)
All-To-All Connectivity Within Node (Low Latency, High Bandwidth)
Unparalleled I/O Scalability Across Nodes (8 Fabric Endpoints per node, DAOS)

The approach is very similar to what NVIDIA did with their NVIDIA DGX-2, stacking 16 Volta GPUs inside a singular node and connecting them through NVSwitch. But unlike Intel's plan, NVIDIA termed the entire node as a GPU while Intel is terming the 16 chiplets featured on a singular interposer a GPU. And there are six of these GPUs on a singular node. It is likely that NVIDIA will also be following the MCM (Multi-Chip-Module) chiplet design on their future HPC products such as Ampere which is expected to make debut in 2020, a year before Intel's Ponte Vecchio lands in the HPC market.

While Datacenter would be first to use 7nm Xe GPUs, Intel's 10nm Xe GPU lineup would be making its way to the mainstream and enthusiast gaming market in 2020 which would be utilizing the more consumer-tuned Xe LP and Xe HP GPU architectures.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Intel Xe GPU Architecture Detailed – Ponte Vecchio Xe HPC Exascale GPU With 1000s of EUs, Massive HBM Memory, Rambo Cache

Intel Xe GPU Architecture Detailed – Ponte Vecchio Xe HPC Exascale GPU With 1000s of EUs, Massive HBM Memory, Rambo Cache

Intel Details Xe GPU Architecture - Ponte Vecchio For Exascale Compute Scalable To 1000s of EUs, XEMF Scalable Memory Fabric, Rambo Cache, Forveros Packaging, 40X Increase In FP64 Compute Per EU & A lot More!

Blue Team's First HPC GPU, The 7nm Ponte Vecchio - Landing in The Aurora Supercomputer in 2021

Trending Stories

Xbox Layoffs Reduce id Tech Engine Team to 1 Developer, As Unreal Engine Dominance Is Set To Grip The Industry

ASUS Demos Chinese “CXMT” DDR5 Memory Clocking Up To 8400 MT/s On Its AM5 Boards, 48 GB & 32 GB Kits From Kingbank & Lexar

AMD Prepares For Zen 6 EPYC CPUs Launch For July 22nd-23rd, Confirms AMD’s Mark Papermaster

Intel’s Arc Pro B70 Beats NVIDIA’s RTX 5090D In DeepSeek R1 AI LLM, Despite Costing A Quarter As Much, Offers Over 2000 Tokens/s

NVIDIA’s Rubin Ultra Rack Estimated To Cost $21 Million, With HBM4e Memory Alone Swelling To $1.5 Million Per Unit

Popular Discussions

Intel’s Shot At Fabricating Apple’s A20 Chip For The Base iPhone 18 Collapses As A Credible Leaker Calls The Original Source A ‘Blowhard’

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

AMD Prepares For Zen 6 EPYC CPUs Launch For July 22nd-23rd, Confirms AMD’s Mark Papermaster

NVIDIA’s RTX 3060 12 GB Graphics Card Comeback Proves Just How Bad Things Are For The PC Gaming Market

AMD Ryzen Becomes The Top CPU Choice While Radeon Powers 1 In Every 3 Desktop Gaming GPUs Sold at Microcenter

Intel Xe GPU Architecture Detailed – Ponte Vecchio Xe HPC Exascale GPU With 1000s of EUs, Massive HBM Memory, Rambo Cache

Intel Details Xe GPU Architecture - Ponte Vecchio For Exascale Compute Scalable To 1000s of EUs, XEMF Scalable Memory Fabric, Rambo Cache, Forveros Packaging, 40X Increase In FP64 Compute Per EU & A lot More!

Related Story The Global PC Market Declined By 4.9% Versus Last Year As Memory Shortages Intensify, But MacBook Neo’s Success Shows That x86 Rivals Need To Do More

Blue Team's First HPC GPU, The 7nm Ponte Vecchio - Landing in The Aurora Supercomputer in 2021

Further Reading

Trending Stories

Popular Discussions