Intel Claims Xeon Cascade Lake-AP 56 Core CPU Up To 84% Faster Than AMD’s 64 Core EPYC Rome 7742 in Real-World HPC Benchmarks [Updated]
Update: Intel also sent over the following statement regading the GROMACS results and STH has also updated their story:
Intel is committed to always provide fair, transparent, and accurate performance results and would not intentionally mislead. We received feedback on our original blog and appreciate the community’s passion about performance and the accuracy of benchmarks. Taking the community’s feedback, we have updated this blog with data for the most recent GROMACS 2019.4 version and found no material difference to earlier data posted on 2019.3 version.
Intel has posted a series of new benchmarks claiming that their Xeon class Cascade Lake-AP CPUs run much faster than AMD's 2nd Generation EPYC Rome CPUs. The benchmarks which Intel claims are representative of 'real-world' performance in the HPC segment compare Intel's 2S (dual socket) Xeon Platinum 9282 versus AMD's EPYCX 7742 (also in a dual-socket configuration).
The Cascade Lake-AP Xeon Platinum 9282 56 Core CPU Claims To Be 84% Faster Than AMD's EPYC 7742 64 Core CPU in Intel's 'Real-World' HPC - AI Performance Benchmark Suite
The performance metrics for both processors were posted at Medium where Intel also recently published an article about core scaling and the reliance of modern applications on the number of cores available on processors. According to Intel, 8 cores along with sustained frequencies would result in better scaling than say a 12 core or a 16 core chip. Now Intel may have provided a lot of data backing up their findings but the same Intel was reluctant to move beyond 4 cores back in 2017 when AMD was offering their Ryzen chips with up to 8 cores and 16 threads. It's interesting how suddenly, 8 core processors have become the next-big-thing for Intel's mainstream lineup and the same would happen with 10th Gen family which is expected to receive even more cores.
For the HPC market, Intel says that - More processor cores add compute, but the overall system or workload performance depends on other factors, including:
· The performance of each core
· Software optimizations leveraging specific instructions
· Memory bandwidth to ensure feeding of the cores
· Cluster-level scaling deployed
Anyway, coming back to the topic, Intel's latest benchmarks compared the Xeon Platinum 9200 versus the EPYC 7742. The Xeon Platinum is one of the elusive Cascade Lake-AP processors which feature two dies instead of a single monolithic one, stacking up to 56 cores and 112 threads. The chip has a base clock of 2.60 GHz and a boost clock of 3.80 GHz along with 77 MB of cache and a TDP of 400W. The Intel Cascade Lake-AP chips feature 12 memory channels compared to AMD's 8 memory channels per chip.
The AMD EPYC 7742 is based on a 7nm process node (vs Intel's 14nm+++) and features 64 cores / 128 threads. The chip has a base clock of 2.25 GHz and a boost clock of 3.4 GHz with 256 MB of L3 cache, 128 PCIe Gen 4 lanes and a TDP of 225W. The pricing plays a huge role too and here we see the EPYC 7742 with a price of $6950 US while the Xeon Platinum 9282 is suggested to have a price between $25K - $50K (via Anandtech).
So right off the bat, we can note that this isn't a fair comparison as not only does Intel's chip have a higher running TDP but its cost is at least 3.5x higher than the AMD processor. Yes, the EPYC 7742 is AMD's flagship 2nd Generation Rome processor for servers but even still, this isn't an apples to apples comparison in any possible way.
Update: ServerTheHome's Patrick J Kennedy has found out that the GROMACS version being used by Intel is an outdated one that doesn't utilize the 256-bit wide AVX2 SIMD instruction set that is featured on Zen 2. The GROMACS 2019.3 version was used by Intel in what they are terming as real-world benchmarks, however, the latest version available is 2019.4 which adds support for Zen 2 based EPYC Rome chips like the EPYC 7742 which Intel tested their Xeon Platinum 9282 against. It just goes off to show that even Intel's 'Real-World' benchmarks aren't indicative of actual product performance and may lead to misleading statements against competitor products. And this won't be the first time Intel is using misleading benchmarks or statements to downplay the competition. They have termed several important performance metrics used by tech reviewers that are invalid and not indicative of actual product performance while their own performance metrics surely are.
And a citation to the fix in 2019.4 is here https://t.co/UscE6PRSwB
Very shady to use the older version. We also have consistently, and clearly mentioned our older results are not Zen2 optimized when we published them because of this.
— Patrick J Kennedy (@Patrick1Kennedy) November 5, 2019
The benchmarks show that the Xeon Platinum 9282 delivers an average performance increase of 31%, going as high as up to 84%. There are several HPC specific applications shown which Intel claims are representative of real-world performance metrics in the server market. Dissecting each application reveal the breakdown of performance in each individual workload for the benchmark and in the case of the Manufacturing application (ANSYS Fluent Workload), Intel has a 13% average performance uplift over AMD's EPYC Rome chip. Intel also claims that having AVX-512 onboard the new Xeon chips gives them an edge in several applications such as VASP, NAMD, GROMACS, FSI & LAMMPS.
The HPC segment is broad with varying compute requirements by workload. 56 core Xeon Platinum 9282 ranges from 8% to 84% better performance (31% higher geomean) than AMD’s 64 core Rome-based system (7742) on leading real-world HPC workloads across manufacturing, life sciences, financial services and earth sciences(2).
Some of the applications and results are shown above are a geomean of several specific workloads, all with different characteristics and sensitivities. Drilling into the details of these workloads provides further insight into performance. For example, Xeon Platinum 9282 leads AMD Rome 7742 by 13% on a geomean of 14 ANSYS Fluent workloads. Across those 14 different CFD simulations, Xeon’s results range from 2% lower to 36% higher.
Intel further goes on to claim that Xeon Platinum 9200 series processors offer a lower TCO (Total Cost of Ownership). Since the performance of Xeon Platinum 9200 series is higher, you'd have to require a fewer number of nodes which should drive down the node acquisition cost, lower fabric, switching and cabling cost. It is also mentioned that while the Xeon-AP has a higher TDP and power requirement than AMD's EPYC Rome (225W vs 400W), it should be offset by the lower number of nodes required to reach the same performance.
Aside from raw compute power, memory bandwidth is also highlighted as the main performance measurement factor and surprisingly, major industry players are already evaluating replacing their existing Intel-based systems with EPYC processors. Just a day before Intel published their report on performance, it was revealed that Netflix may soon be switching to AMD's EPYC based platform as the TCO is similar but the EPYC solution may actually offer higher bandwidth than an Intel Xeon based system.
There will be a lot more action next-year in the server department as AMD will launch their energy efficient 7nm+ EPYC Milan CPUs to tackle both, Intel 14nm Cooper Lake and 10nm Ice Lake lineups, simultaneously.