Intel’s Sapphire Rapids HBM ‘Xeon Scalable’ CPUs With 64 GB HBM2e Memory Offer Up To 3x Performance Increase Over Ice Lake Xeons

Submit

Intel has once again demonstrated its upcoming Sapphire Rapids HBM Xeon Scalable CPUs with up to 64 GB HBM2e memory in various workloads.

Intel Promises 3x Performance Boost With Its Next-Gen Sapphire Rapids HBM 'Xeon Scalable' CPU Lineup

According to Intel, the Sapphire Rapids-SP will come in two package variants, a standard, and an HBM configuration. The standard variant will feature a chiplet design composed of four XCC dies that will feature a die size of around 400mm2. This is the die size for a singular XCC die and there will be four in total on the top Sapphire Rapids-SP Xeon chip. Each die will be interconnected via EMIB which has a pitch size of 55u and a core pitch of 100u.

Intel’s Arc A380 Is The First Graphics Card To Support DisplayPort 2.0 Standard But No Monitor Out There To Make Use of It

The Intel Xeon processor code-named Sapphire Rapids with High Bandwidth Memory (HBM) is a great example of how we are leveraging advanced packaging technologies and silicon innovations to bring substantial performance, bandwidth, and power-saving improvements for HPC. With up to 64 gigabytes of high-bandwidth HBM2e memory in the package and accelerators integrated into the CPU, we’re able to unleash memory bandwidth-bound workloads while delivering significant performance improvements across key HPC use cases.

When comparing 3rd Gen Intel Xeon Scalable processors to the upcoming Sapphire Rapids HBM processors, we are seeing two- to three-times performance increases across weather research, energy, manufacturing, and physics workloads2. At the keynote, Ansys CTO Prith Banerjee also shows that Sapphire Rapids HBM delivers up to 2x performance increase on real-world workloads from Ansys Fluent and ParSeNet.

The standard Sapphire Rapids-SP Xeon chip will feature 10 EMIB interconnects and the entire package will measure at a mighty 4446mm2. Moving over to the HBM variant, we are getting an increased number of interconnects which sit at 14 and are needed to interconnect the HBM2E memory to the cores.

The four HBM2E memory packages will feature 8-Hi stacks so Intel is going for at least 16 GB of HBM2E memory per stack for a total of 64 GB across the Sapphire Rapids-SP package. Talking about the package, the HBM variant will measure at an insane 5700mm2 or 28% larger than the standard variant. Compared to the recently leaked EPYC Genoa numbers, the HBM2E package for Sapphire Rapids-SP would end up 5% larger while the standard package will be 22% smaller.

Intel Arc Pro A50 & Pro A40 Workstation Graphics Cards Spotted: Full ACM-G11 Alchemist GPU With 1024 Cores at 2450 MHz, 6 GB GDDR6 Memory

  • Intel Sapphire Rapids-SP Xeon (Standard Package) - 4446mm2
  • Intel Sapphire Rapids-SP Xeon (HBM2E Package) - 5700mm2
  • AMD EPYC Genoa (12 CCD Package) - 5428mm2

Intel also states that the EMIB link provides twice the bandwidth density improvement and 4 times better power efficiency compared to standard package designs. Interestingly, Intel calls the latest Xeon lineup Logically monolithic which means that they are referring to the interconnect that'll offer the same functionality as a single-die would but technically, there are four chiplets that will be interconnected together. You can read the full details regarding the standard 56 core & 112 thread Sapphire Rapids-SP Xeon CPUs here.

Intel Xeon SP Families (Preliminary):

Family BrandingSkylake-SPCascade Lake-SP/APCooper Lake-SPIce Lake-SPSapphire RapidsEmerald RapidsGranite RapidsDiamond Rapids
Process Node14nm+14nm++14nm++10nm+Intel 7Intel 7Intel 3Intel 3?
Platform NameIntel PurleyIntel PurleyIntel Cedar IslandIntel WhitleyIntel Eagle StreamIntel Eagle StreamIntel Mountain Stream
Intel Birch Stream
Intel Mountain Stream
Intel Birch Stream
Core ArchitectureSkylakeCascade LakeCascade LakeSunny CoveGolden CoveRaptor CoveRedwood Cove?Lion Cove?
IPC Improvement (Vs Prev Gen)10%0%0%20%19%8%?35%?39%?
MCP (Multi-Chip Package) SKUsNoYesNoNoYesYesTBD (Possibly Yes)TBD (Possibly Yes)
SocketLGA 3647LGA 3647LGA 4189LGA 4189LGA 4677LGA 4677TBDTBD
Max Core CountUp To 28Up To 28Up To 28Up To 40Up To 56Up To 64?Up To 120?Up To 144?
Max Thread CountUp To 56Up To 56Up To 56Up To 80Up To 112Up To 128?Up To 240?Up To 288?
Max L3 Cache38.5 MB L338.5 MB L338.5 MB L360 MB L3105 MB L3120 MB L3?240 MB L3?288 MB L3?
Vector EnginesAVX-512/FMA2AVX-512/FMA2AVX-512/FMA2AVX-512/FMA2AVX-512/FMA2AVX-512/FMA2AVX-1024/FMA3?AVX-1024/FMA3?
Memory SupportDDR4-2666 6-ChannelDDR4-2933 6-ChannelUp To 6-Channel DDR4-3200Up To 8-Channel DDR4-3200Up To 8-Channel DDR5-4800Up To 8-Channel DDR5-5600?Up To 12-Channel DDR5-6400?Up To 12-Channel DDR6-7200?
PCIe Gen SupportPCIe 3.0 (48 Lanes)PCIe 3.0 (48 Lanes)PCIe 3.0 (48 Lanes)PCIe 4.0 (64 Lanes)PCIe 5.0 (80 lanes)PCIe 5.0 (80 Lanes)PCIe 6.0 (128 Lanes)?PCIe 6.0 (128 Lanes)?
TDP Range (PL1)140W-205W165W-205W150W-250W105-270WUp To 350WUp To 375W?Up To 400W?Up To 425W?
3D Xpoint Optane DIMMN/AApache PassBarlow PassBarlow PassCrow PassCrow Pass?Donahue Pass?Donahue Pass?
CompetitionAMD EPYC Naples 14nmAMD EPYC Rome 7nmAMD EPYC Rome 7nmAMD EPYC Milan 7nm+AMD EPYC Genoa ~5nmAMD EPYC BergamoAMD EPYC TurinAMD EPYC Venice
Launch201720182020202120222023?2024?2025?

As for the footnotes for the Intel Sapphire Rapids HBM 'Xeon Scalable' CPU performance, you can see them below:

CloverLeaf

  • Test by Intel as of 04/26/2022. 1-node, 2x Intel® Xeon® Platinum 8360Y CPU, 72 cores, HT On, Turbo On, Total Memory 256GB (16x16GB DDR4 3200 MT/s ), SE5C6200.86B.0021.D40.2101090208, Ubuntu 20.04, Kernel 5.10, 0xd0002a0, ifort 2021.5, Intel MPI 2021.5.1, build knobs: -xCORE-AVX512 –qopt-zmm-usage=high
  • Test by Intel as of 04/19/22. 1-node, 2x Pre-production Intel® Xeon® Scalable Processor codenamed Sapphire Rapids Plus HBM, >40 cores, HT ON, Turbo ON, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version EGSDCRB1.86B.0077.D11.2203281354, ucode revision=0x83000200, CentOS Stream 8, Linux version 5.16, ifort 2021.5, Intel MPI 2021.5.1, build knobs: -xCORE-AVX512 –qopt-zmm-usage=high

OpenFOAM

  • Test by Intel as of 01/26/2022. 1-node, 2x Intel® Xeon® Platinum 8380 CPU), 80 cores, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, 0xd000270, Rocky Linux 8.5 , Linux version 4.18., OpenFOAM® v1912, Motorbike 28M @ 250 iterations; Build notes: Tools: Intel Parallel Studio 2020u4, Build knobs: -O3 -ip -xCORE-AVX512
  • Test by Intel as of 01/26/2022 1-node, 2x Pre-production Intel® Xeon® Scalable Processor codenamed Sapphire Rapids Plus HBM, >40 cores, HT Off, Turbo Off, Total Memory 128 GB (HBM2e at 3200 MHz), preproduction platform and BIOS, CentOS 8, Linux version 5.12, OpenFOAM® v1912, Motorbike 28M @ 250 iterations; Build notes: Tools: Intel Parallel Studio 2020u4, Build knobs: -O3 -ip -xCORE-AVX512

WRF

  • Test by Intel as of 05/03/2022. 1-node, 2x Intel® Xeon® 8380 CPU, 80 cores, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode revision=0xd000270, Rocky Linux 8.5, Linux version 4.18, WRF v4.2.2
  • Test by Intel as of 05/03/2022. 1-node, 2x Pre-production Intel® Xeon® Scalable Processor codenamed Sapphire Rapids Plus HBM, >40 cores, HT ON, Turbo ON, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version EGSDCRB1.86B.0077.D11.2203281354, ucode revision=0x83000200, CentOS Stream 8, Linux version 5.16, WRF v4.2.2

YASK

  • Test by Intel as of 05/9/2022. 1-node, 2x Intel® Xeon® Platinum 8360Y CPU, 72 cores, HT On, Turbo On, Total Memory 256GB (16x16GB DDR4 3200 MT/s ), SE5C6200.86B.0021.D40.2101090208, Rocky linux 8.5, kernel 4.18.0, 0xd000270, Build knobs: make -j YK_CXX='mpiicpc -cxx=icpx' arch=avx2 stencil=iso3dfd radius=8,
  • Test by Intel as of 05/03/22. 1-node, 2x Pre-production Intel® Xeon® Scalable Processor codenamed Sapphire Rapids Plus HBM, >40 cores, HT ON, Turbo ON, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version EGSDCRB1.86B.0077.D11.2203281354, ucode revision=0x83000200, CentOS Stream 8, Linux version 5.16, Build knobs: make -j YK_CXX='mpiicpc -cxx=icpx' arch=avx2 stencil=iso3dfd radius=8,

Ansys Fluent

  • Test by Intel as of 2/2022 1-node, 2x Intel ® Xeon ® Platinum 8380 CPU, 80 cores, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode revision=0xd000270, Rocky Linux 8.5 , Linux version 4.18, Ansys Fluent 2021 R2 Aircraft_wing_14m; Build notes: Commercial release using Intel 19.3 compiler and Intel MPI 2019u
  • Test by Intel as of 2/2022 1-node, 2x Pre-production Intel® Xeon® Scalable Processor code names Sapphire Rapids with HBM, >40 cores, HT Off, Turbo Off, Total Memory 128 GB (HBM2e at 3200 MHz), preproduction platform and BIOS, CentOS 8, Linux version 5.12, Ansys Fluent 2021 R2 Aircraft_wing_14m; Build notes: Commercial release using Intel 19.3 compiler and Intel MPI 2019u8

Ansys ParSeNet

  • Test by Intel as of 05/24/2022. 1-node, 2x Intel® Xeon® Platinum 8380 CPU, 80 cores, HT On, Turbo On, Total Memory 256GB (16x16GB DDR4 3200 MT/s [3200 MT/s]), SE5C6200.86B.0021.D40.2101090208, Ubuntu 20.04.1 LTS, 5.10, ParSeNet (SplineNet), PyTorch 1.11.0, Torch-CCL 1.2.0, IPEX 1.10.0, MKL (2021.4-Product Build 20210904), oneDNN (v2.5.0)
  • Test by Intel as of 04/18/2022. 1-node, 2x Pre-production Intel® Xeon® Scalable Processor codenamed Sapphire Rapids Plus HBM, 112 cores, HT On, Turbo On, Total Memory 128GB (HBM2e 3200 MT/s), EGSDCRB1.86B.0077.D11.2203281354, CentOS Stream 8, 5.16, ParSeNet (SplineNet), PyTorch 1.11.0, Torch-CCL 1.2.0, IPEX 1.10.0, MKL (2021.4-Product Build 20210904), oneDNN (v2.5.0)
Submit