Intel Xe HPC ‘Ponte Vecchio’ GPU & Xeon Sapphire Rapids CPU Powered Aurora Exascale Supercomputer Further Detailed – Deploys in 2021

Hassan Mujtaba • May 10, 2020 at 02:00pm EDT

The Intel-powered Aurora supercomputer which would house the next-generation Xe HPC 'Ponte Vecchio' GPU and Sapphire Rapids Xeon CPU has been further detailed. Launching next year, the Aurora supercomputer will be deployed at the Argonne National Laboratory and will be one of the first exascale machines on the planet.

Intel's Exascale Aurora Supercomputer For 2021 Detailed - Xe HPC 'Ponte Vecchio' GPUs & Sapphire Rapids Xeon CPUs

Featuring Intel's high-performance computing lineup for 2021, the Aurora supercomputer is both fast and impressive from a technical perspective. During the ECP Annual Meeting, more details of the supercomputer were revealed, pointing out the specific rack configurations & hardware specs.

The Aurora Supercomputer is planned for deployment at Argonne in 2021 and will peak at over 1 Exaflop of sustained performance. The machine will feature Intel's Xe HPC (7nm) 'Ponte Vecchio' GPUs and Sapphire Rapids (10nm++) Xeon processors. Each node will consist of 6 Xe HPC GPUs & 2 Sapphire Rapids Xeon CPUs. The 6 Ponte Vecchio (PVC) GPUs will feature an all-to-all connection with low latency and high-bandwidth nature. A unified memory architecture will be available across the CPUs and GPUs.

intel-aurora-supercomputer_xe-hpc-ponte-vecchio-7nm-gpu_sapphire-rapids-xeon-10nm-cpus_1

intel-aurora-supercomputer_xe-hpc-ponte-vecchio-7nm-gpu_sapphire-rapids-xeon-10nm-cpus_2

In terms of memory, storage, and bandwidth, we are looking at greater than 10 Petabytes of system memory, and the Cray Slingshot Fabric interconnect (Shasta platform). There will be a total of 8 slingshot fabric endpoints per node on the Aurora supercomputer. The system will feature two diverse filesystems with one of them being DAOS (Distributed Asynchronous Object Store) and the other being Lustre. Details for the filesystem such as capacities and bandwidth are listed below:

DAOS:

Around 230 PB of Storage Capacity
Greater than 25 TB/s Bandwidth

Lustre:

150 PB of complete Storage Capacity
Around 1 TB/s Bandwidth

A single Aurora rack will be designed by Cray as part of their Shasta system which supports a diverse range of CPUs and features scale-optimized cabinets for density, cooling & high-network bandwidth. Cray also provides its own SW stack for improved modularity while delivering a unified and high-performance interconnect. Slingshot itself is the 8th Generation interconnect fabric with features such as Congestion Management, 3 hop dragonfly, and traffic classes. It uses the Rosetta high-bandwidth switches that provide up to 25.6 Tb/s bandwidth per switch (25 GB/s per direction).

Intel Xe HPC 'Ponte Vecchio' GPU - What We Know So Far

As for the two key products, we have talked about them in-detail recently. The Intel Xe HPC 'Ponte Vecchio' GPUs will be the lead 7nm product arriving in 2021. it will feature an MCM package design based on the Foveros 3D packaging technology. Each MCM GPU will be connected to high-density HBM DRAM packages through EMIB & will additionally feature a faster Rambo Cache close to them which will be connected through Foveros. Finally, while Slingshot provides an interconnect between the nodes, Intel's Xe Link will be interconnecting the 6 Xe HPC GPUs together.

Intel has previously detailed that its Xe HPC GPUs will feature 1000s of EUs. So far, we have only seen Xe LP with 96 EUs which makes up for a total of 768 cores. Currently, Intel features 8 EUs per subslice. A subslice within a Gen 12 GPU is similar to the NVIDIA SM unit inside the GPC or an AMD CU within the Shader Engine. Intel currently features 8 EUs per subslice on its Gen 9.5 and Gen 11 GPUs so if the same hierarchy is kept, we can see a significant amount of Super-Slices consisting of many subslices. Each Gen 11 and Gen 9.5 EU also contain 8 ALUs which will remain the same on Gen 12 too from the looks of it.

intel-aurora-supercomputer_xe-hpc-ponte-vecchio-7nm-gpu_sapphire-rapids-xeon-10nm-cpus_6

intel-aurora-supercomputer_xe-hpc-ponte-vecchio-7nm-gpu_sapphire-rapids-xeon-10nm-cpus_7

Rounding it up, A 1000 EU chip will make up for 8000 cores but it has been confirmed that 1000 is just the base value and the actual core count is much bigger than that. A 4-tile Xe HP GPU with 2048 EUs or 16,384 cores has already been detailed so it's likely that HPC parts will be much bigger than that. Here are the actual EU counts of Intel's various MCM-based Xe HP GPUs along with estimated core counts and TFLOPs:

Intel Xe HP (12.5) 1-Tile GPU: 512 EU [Est: 4096 Cores, 12.2 TFLOPs assuming 1.5GHz, 150W]
Intel Xe HP (12.5) 2-Tile GPU: 1024 EUs [Est: 8192 Cores, 20.48 assuming 1.25 GHz, TFLOPs, 300W]
Intel Xe HP (12.5) 4-Tile GPU: 2048 EUs [Est: 16,384 Cores, 36 TFLOPs assuming 1.1 GHz, 400W/500W]

Intel Xe class GPUs would feature variable vector width as mentioned below:

SIMT (GPU Style)
SIMD (CPU Style)
SIMT + SIMD (Max Performance)

Raja specifically talked about the Xe HPC class GPUs since that's what the developer conference is entirely about. Intel's Xe HPC GPUs would be able to scale to 1000s of EUs and each Execution unit has been upgraded to deliver 40 times better double-precision floating-point compute horsepower.

intel-xe-hpc-gpu_ponte-vecchio_architecture_raja-koduri_1

intel-xe-hpc-gpu_ponte-vecchio_architecture_raja-koduri_2

The EU's would be connected with a new scalable memory fabric known as XEMF (short form of XE Memory Fabric) to several high-bandwidth memory channels. The Xe HPC architecture would also include a very large unified cache known as Rambo cache which would connect several GPUs together. This Rambo cache would offer a sustainable peak FP64 compute perf throughout double-precision workloads by delivering huge memory bandwidth. Just in terms of process optimizations, following are the few key improvements that Intel has announced for their 7nm process node over 10nm:

2x density scaling vs 10nm
Planned intra-node optimizations
4x reduction in design rules
EUV
Next-Gen Foveros & EMIB Packaging

The Xe HPC GPUs would be using Forveros technology to interconnect with the Rambo cache which would be shared across several other Xe HPC GPUs on the same interposer. Just like their Xeon brethren, Intel's Xe HPC GPUs would come with ECC memory/cache correction and Xeon-Class RAS.

Intel Sapphire Rapids Xeon CPUs - What We Know So Far

The 10nm++ based Sapphire Rapids is expected to make use of the updated Willow Cove core architecture which replaces Sunny Cove in 2020. The Sapphire Rapids lineup will make use of 8 channel DDR5 memory and support PCIe Gen 5.0 on the Eagle Stream platform. The Eagle Stream platform will also introduce the LGA 4677 socket which will be replacing the LGA 4189 socket for Intel's upcoming Whitley platform which would house Cooper Lake-SP and Ice Lake-SP processors.

This will allow Intel to match up or even outpace AMD's EPYC offerings if Milan will end up reusing DDR4 and PCIe Gen 4 from the upcoming EPYC Rome platform. That remains to be seen. Intel's Sapphire Rapids family will be launching the same year that Intel introduces their first datacenter Xe GPUs based on the 7nm process node.

The platform would be competing against AMD's Zen 4 based EPYC Genoa lineup which would also be moving to a newer platform known as SP5. AMD has promised new memory along with new capabilities for the Genoa lineup which would include support for DDR5, PCIe 5.0, and more. We don't know what other features would the new lineup include but Intel is doing the same with 8-channel DDR5 support & a new interconnect for the Eagle Stream platform.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Intel Xe HPC ‘Ponte Vecchio’ GPU & Xeon Sapphire Rapids CPU Powered Aurora Exascale Supercomputer Further Detailed – Deploys in 2021

Intel Xe HPC ‘Ponte Vecchio’ GPU & Xeon Sapphire Rapids CPU Powered Aurora Exascale Supercomputer Further Detailed – Deploys in 2021

Intel's Exascale Aurora Supercomputer For 2021 Detailed - Xe HPC 'Ponte Vecchio' GPUs & Sapphire Rapids Xeon CPUs

Trending Stories

Battlestar Galactica: Scattered Hopes Review – Sometimes, You Have to Roll a Hard six

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

Huawei Aims To Reduce The DRAM Supply Shortage With An Approach Only The Brave And The Bold Can Execute; Building Its Own Fabrication Plant

Elon Musk Accuses Sam Altman Of Stealing A Charity And Apple’s Technology, While Vowing That SpaceX’s AI1 Satellites Will Fly Next Year But That Altman Will Be In Jail By Then

Cygames Revives Project Awakening a Decade After Reveal, Ditching Its Own Engine for Unreal Engine 5

Popular Discussions

AMD Prepares For Zen 6 EPYC CPUs Launch For July 22nd-23rd, Confirms AMD’s Mark Papermaster

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

NVIDIA’s RTX 3060 12 GB Graphics Card Comeback Proves Just How Bad Things Are For The PC Gaming Market

AMD Ryzen Becomes The Top CPU Choice While Radeon Powers 1 In Every 3 Desktop Gaming GPUs Sold at Microcenter

Intel Xe HPC ‘Ponte Vecchio’ GPU & Xeon Sapphire Rapids CPU Powered Aurora Exascale Supercomputer Further Detailed – Deploys in 2021

Intel's Exascale Aurora Supercomputer For 2021 Detailed - Xe HPC 'Ponte Vecchio' GPUs & Sapphire Rapids Xeon CPUs

Related Story Intel EMIB-T Breaks Past Existing AI & HPC Scaling Limits, Enabling Ultra-Large Die Complexes With Over 10x Reticle Dies & 12 Gb/s+ HBM4e DRAM

Further Reading

Trending Stories

Popular Discussions