The Intel-powered Aurora supercomputer which would house the next-generation Xe HPC 'Ponte Vecchio' GPU and Sapphire Rapids Xeon CPU has been further detailed. Launching next year, the Aurora supercomputer will be deployed at the Argonne National Laboratory and will be one of the first exascale machines on the planet.
Intel's Exascale Aurora Supercomputer For 2021 Detailed - Xe HPC 'Ponte Vecchio' GPUs & Sapphire Rapids Xeon CPUs
Featuring Intel's high-performance computing lineup for 2021, the Aurora supercomputer is both fast and impressive from a technical perspective. During the ECP Annual Meeting, more details of the supercomputer were revealed, pointing out the specific rack configurations & hardware specs.
The Aurora Supercomputer is planned for deployment at Argonne in 2021 and will peak at over 1 Exaflop of sustained performance. The machine will feature Intel's Xe HPC (7nm) 'Ponte Vecchio' GPUs and Sapphire Rapids (10nm++) Xeon processors. Each node will consist of 6 Xe HPC GPUs & 2 Sapphire Rapids Xeon CPUs. The 6 Ponte Vecchio (PVC) GPUs will feature an all-to-all connection with low latency and high-bandwidth nature. A unified memory architecture will be available across the CPUs and GPUs.
In terms of memory, storage, and bandwidth, we are looking at greater than 10 Petabytes of system memory, and the Cray Slingshot Fabric interconnect (Shasta platform). There will be a total of 8 slingshot fabric endpoints per node on the Aurora supercomputer. The system will feature two diverse filesystems with one of them being DAOS (Distributed Asynchronous Object Store) and the other being Lustre. Details for the filesystem such as capacities and bandwidth are listed below:
- Around 230 PB of Storage Capacity
- Greater than 25 TB/s Bandwidth
- 150 PB of complete Storage Capacity
- Around 1 TB/s Bandwidth
A single Aurora rack will be designed by Cray as part of their Shasta system which supports a diverse range of CPUs and features scale-optimized cabinets for density, cooling & high-network bandwidth. Cray also provides its own SW stack for improved modularity while delivering a unified and high-performance interconnect. Slingshot itself is the 8th Generation interconnect fabric with features such as Congestion Management, 3 hop dragonfly, and traffic classes. It uses the Rosetta high-bandwidth switches that provide up to 25.6 Tb/s bandwidth per switch (25 GB/s per direction).
Intel Xe HPC 'Ponte Vecchio' GPU - What We Know So Far
As for the two key products, we have talked about them in-detail recently. The Intel Xe HPC 'Ponte Vecchio' GPUs will be the lead 7nm product arriving in 2021. it will feature an MCM package design based on the Foveros 3D packaging technology. Each MCM GPU will be connected to high-density HBM DRAM packages through EMIB & will additionally feature a faster Rambo Cache close to them which will be connected through Foveros. Finally, while Slingshot provides an interconnect between the nodes, Intel's Xe Link will be interconnecting the 6 Xe HPC GPUs together.
Intel has previously detailed that its Xe HPC GPUs will feature 1000s of EUs. So far, we have only seen Xe LP with 96 EUs which makes up for a total of 768 cores. Currently, Intel features 8 EUs per subslice. A subslice within a Gen 12 GPU is similar to the NVIDIA SM unit inside the GPC or an AMD CU within the Shader Engine. Intel currently features 8 EUs per subslice on its Gen 9.5 and Gen 11 GPUs so if the same hierarchy is kept, we can see a significant amount of Super-Slices consisting of many subslices. Each Gen 11 and Gen 9.5 EU also contain 8 ALUs which will remain the same on Gen 12 too from the looks of it.
Rounding it up, A 1000 EU chip will make up for 8000 cores but it has been confirmed that 1000 is just the base value and the actual core count is much bigger than that. A 4-tile Xe HP GPU with 2048 EUs or 16,384 cores has already been detailed so it's likely that HPC parts will be much bigger than that. Here are the actual EU counts of Intel's various MCM-based Xe HP GPUs along with estimated core counts and TFLOPs:
- Intel Xe HP (12.5) 1-Tile GPU: 512 EU [Est: 4096 Cores, 12.2 TFLOPs assuming 1.5GHz, 150W]
- Intel Xe HP (12.5) 2-Tile GPU: 1024 EUs [Est: 8192 Cores, 20.48 assuming 1.25 GHz, TFLOPs, 300W]
- Intel Xe HP (12.5) 4-Tile GPU: 2048 EUs [Est: 16,384 Cores, 36 TFLOPs assuming 1.1 GHz, 400W/500W]
Intel Xe class GPUs would feature variable vector width as mentioned below:
- SIMT (GPU Style)
- SIMD (CPU Style)
- SIMT + SIMD (Max Performance)
Raja specifically talked about the Xe HPC class GPUs since that's what the developer conference is entirely about. Intel's Xe HPC GPUs would be able to scale to 1000s of EUs and each Execution unit has been upgraded to deliver 40 times better double-precision floating-point compute horsepower.
The EU's would be connected with a new scalable memory fabric known as XEMF (short form of XE Memory Fabric) to several high-bandwidth memory channels. The Xe HPC architecture would also include a very large unified cache known as Rambo cache which would connect several GPUs together. This Rambo cache would offer a sustainable peak FP64 compute perf throughout double-precision workloads by delivering huge memory bandwidth. Just in terms of process optimizations, following are the few key improvements that Intel has announced for their 7nm process node over 10nm:
- 2x density scaling vs 10nm
- Planned intra-node optimizations
- 4x reduction in design rules
- Next-Gen Foveros & EMIB Packaging
The Xe HPC GPUs would be using Forveros technology to interconnect with the Rambo cache which would be shared across several other Xe HPC GPUs on the same interposer. Just like their Xeon brethren, Intel's Xe HPC GPUs would come with ECC memory/cache correction and Xeon-Class RAS.
Intel Sapphire Rapids Xeon CPUs - What We Know So Far
The 10nm++ based Sapphire Rapids is expected to make use of the updated Willow Cove core architecture which replaces Sunny Cove in 2020. The Sapphire Rapids lineup will make use of 8 channel DDR5 memory and support PCIe Gen 5.0 on the Eagle Stream platform. The Eagle Stream platform will also introduce the LGA 4677 socket which will be replacing the LGA 4189 socket for Intel's upcoming Whitley platform which would house Cooper Lake-SP and Ice Lake-SP processors.
This will allow Intel to match up or even outpace AMD's EPYC offerings if Milan will end up reusing DDR4 and PCIe Gen 4 from the upcoming EPYC Rome platform. That remains to be seen. Intel's Sapphire Rapids family will be launching the same year that Intel introduces their first datacenter Xe GPUs based on the 7nm process node.
The platform would be competing against AMD's Zen 4 based EPYC Genoa lineup which would also be moving to a newer platform known as SP5. AMD has promised new memory along with new capabilities for the Genoa lineup which would include support for DDR5, PCIe 5.0, and more. We don't know what other features would the new lineup include but Intel is doing the same with 8-channel DDR5 support & a new interconnect for the Eagle Stream platform.