AVX-512 performance benchmarks completed on AMD Genoa, Intel Sapphire Rapids, and Ice Lake CPUs
Last week, Intel launched the fourth generation Xeon Scalable processors, also known as Sapphire Rapids, promising increased performance for the server-based processors. They introduced a brand new ISA, Advanced Matrix Extensions, and more to assist with expanding the improvement in artificial intelligence and machine learning.
However, with the AVX-512 extension set, which also has utilization in AI, HPC, and ML, there needed to be more information at launch on the improvement gains for the scalable processors. Michael Larabel, Linux analyst, and Editor of the Linux hardware website Phoronix, put the new processor through numerous benchmarks. They pitted it against its predecessor Ice Lake and AMD's new Genoa processors and the results speak for themselves.
Larabel initiated several tests through Phoronix Test Suite, Phoromatic, & the OpenBenchmarking website, in which he is the lead developer on all projects. The tests undertaken on the three CPUs were all based on testing the AVX performance in workloads such as:
- Neural Magic DeepSparse - A CPU runtime that utilizes sparsity found in neural networks leads to a byproduct of lessening computing.
- LCzero - Also known as Leela Chess Zero, this chess software implements UCI protocol, requiring a chess GUI similar to Arena Chess GUI, BanksiaGUI, Cutechess, Nibbler, and Chessbase.
- Embree - created by Intel, Embree is a set of ray tracing kernels to assist graphics application engineers in enhancing the performance of photorealistic rendering applications.
- OpenVKL - also created by Intel, Open VKL is designed with open-source software that understands data stored with Open VDB and can access it without conversion.
- Open Image Denoise - Intel Open Image Denoise builds on the Intel oneAPI Deep Neural Network Library, also known as oneDNN. In real-time, it exploits modern instruction sets like Intel SSE4, AVX2, and AVX-512. This is done so that the exploiting will achieve high denoising performance.
- OSPRay (Studio) - Intel's OSPRay Studio is an open-source, interactive ray tracing and visualization program.
- oneDNN - The Intel oneAPI Deep Neural Network Library (or oneDNN) delivers optimized deep learning building block performance.
- Cpuminer-opt - Cpuminer-opt is a CPU mining software forked into two separate versions —Cpuminer-opt and Cpuminer-gr, which is used for Raptoreum cryptocurrency.
- OpenVINO - The Open Visual Inference and Neural network Optimization is a free toolkit that assists with optimizing deep learning models from a single framework and deploys them utilizing an inference engine onto Intel hardware, with Intel being the company that created the toolkit.
- miniBUDE - a core computation of the Bristol University Docking Engine found in other HPC programming models.
- SMHasher - SMHasher is "a test suite designed to test the distribution, collision, and performance properties of non-cryptographic hash functions."
The AVX-512 extensions active in most tests showed good gains for all CPUs however, Sapphire Rapids Xeon CPUs saw the biggest gain with AVX-512 of up to 44% whereas EPYC Genoa saw a gain of 21%.
Surprisingly, Intel not only delivered a bigger performance gain but also delivered the best efficiency with AVX-512 which is neat considering AMD went heavy on marketing AVX-512 for EPYC Genoa chips whereas Intel didn't talk much about AVX-512 on its Sapphire Rapids chips. With AVX-512 enabled, the Intel Sapphire Rapids CPUs were able to match or outperform the Genoa parts & only with AVX-512 were the EPYC chips able to deliver the boost uplift. Please note that the performance gain is a comparison of generation versus generation and not a direct comparison to AMD Genoa as Milan did not offer AVX-512 support.
Following is what Phoronix had to say about their findings:
The geometric mean also shows how important AVX-512 is for the success of 4th Gen EPYC Genoa in being competitive against 4th Gen Xeon Scalable for HPC workloads. Had Zen 4 not added AVX-512, the EPYC 9654 2P AVX-512-disabled results came out just behind the Xeon Platinum 8490H 2P with AVX-512 enabled. A Zen 4 server processor without AVX-512 would have been a neck-and-neck race between Sapphire Rapids and Genoa in more workloads. But instead the EPYC 9654 2P with AVX-512 came out 19% faster than the Xeon Platinum 8490H processors in this set of benchmarks.
I'm left rather surprised that Intel hadn't more notably promoted their AVX-512 improvements with 4th Gen Xeon Scalable at launch, but in any case it's good seeing AVX-512 providing greater uplift while also not having the significant impact on power consumption that was seen with earlier generations of AVX-512 processors. This can immediately be of benefit to a lot of existing software out there compared to having to adapt to make use of AMX and the new accelerators. Hopefully this more efficient AVX-512 with Sapphire Rapids paired with AMD Zen 4 CPUs now having AVX-512 will lead to more software developers considering AVX-512 optimizations for their software.
Larabel anticipates that developers will continue utilizing the AVX-512 compatible software already in the market and lessen the strain of adapting to the newer AMX extension set, where more recent accelerators would need further learning and understanding from development teams.