Intel Finally Replies to AMD Naples – Points Out HPC Achilles Heel, Lack of Ecosystem, Latency Issues, Calls It ‘4-Glued Together Desktop Dies’ and More
Well well well, it looks like Intel has finally replied in full force to AMD’s Zen comeback and the newly launched EPYC processors (codenamed: Naples). In their Press Workshop 2017, the company went on the offensive, giving a very detailed reply to AMD Naples (read: EPYC) and pointing out all the downsides (Intel’s views, not mine) of Naples. This is a post dedicated to Intel’s reply to AMD Naples.
Intel responds to AMD’s EPYC server chips
Before we begin, I would like to state for the record that this is a rather interesting move by Intel. The company has historically been mostly silent about competitors and chooses to ignore their existence for the most part. While most of the information presented in the slides are a fact, it still doesn’t change the fact that its from Intel’s press material and may contain exaggerations or material omissions (and we have done our best to point them out).
Intel starts by pointing out that it has historically rolled out new architectures consistently for the market while Zen is AMD’s first new architecture in 6 years. It also points out that this architecture will stay in place for 4 years (using us as the source). This is something that is not entirely true, and considering we are the source for this information – I am at liberty to elaborate on it. While the Zen architecture will more or less stay in place for 4 years, AMD later confirmed that Zen 2 and Zen 3 will be landing before 2020 on the 7nm node. While they will in all likelihood be a node shrink of the original Zen, there is a small possibility that they will contain optimized parts. In any case, node shrinks do technically count as a new architecture (leaf out of your book, Intel) and will probably land before the 4 year limit is up.
The next slide shows Intel listing the Intel advantage as compared to the AMD disadvantage. We will elaborate a bit more on the “4 Glued-together Desktop Die” part later on so I will leave this here for now.
This is the first slide, that lists actual facts as opposed to vague statements. It shows that Zen is the first uArch in 6 years, and the Zeppelin die is basically the building block of Naples (EPYC) and has 8 cores with Naples itself having 4 Zeppelin dies on board. All of this is pretty much accurate.
Intel further elaborates on their Mesh architecture which does allow lower and more consistent latencies through most applications. They also call the Zeppelin die a “desktop die”. Without getting into too much detail here, I would like to state that its a known fact that Zen isn’t good for HPC, only for the server and desktop market. So while Naples (EPYC) probably is indeed re-purposed desktop dies, that does not make it inherently bad. Mission critical clients will stick to Intel – of that there is very little doubt – but AMD can make quite an earning for itself in untapped and the lower end spectrum of the server market which just need the raw horsepower and can’t afford to pay the Intel premium.
Next up, we have a comparison of Intel’s latest Skylake-SP architecture vs AMD Naples (which we know is basically Zen with a new stepping). The slide points out that Intel’s 28 cores are true 28 cores while Naples is basically 4×8 cores joined by Infinity Fabric. We have 28MB of L2 cache vs 16 MB of L2 cache. L3 Cache on the Xeon side is up to 37.5 MB where as we have a ‘disjointed’ L3 cache on AMD’s side split into 8 parts of 8 MB (64MB). To be fair to AMD, there is no evidence that there system of making localized L3 clusters doesn’t work. Intel’s Skylake-SP has 768 GB/s of bandwidth whereas Naples has 200 GB/s. Intel has 6 memory channels per die while as AMD has 2 channels per die. Here’s the fun part though, Intel mentions their PCIe lanes and while pointing out the higher PCIe lane count of AMD Naples (EPYC) states that 64 lanes will be reserved for the 2 socket connectivity, implying that only 64 lanes will be actually available in 2S configurations.
Intel also didn’t forget to mention the clear HPC weakness of Zen. Zen architecture has a 2x 128-bit FMA implementation which is equivalent to the Sandy Bridge uArch. Haswell had a 2x 256-bit FMA implementation and the slides suggest that Intel Skylake-SP has a 2x 512-bit FMA implementation. This results in 8 DP FLOPS/clock for Sandy Bridge/Zen, 16 DP FLOPS per clock for Haswell/Broadwell and 32 DP FLOPs/clock for Skylake-SP with AVX-512. This is roughly 4x the performance of Zen’s architecture and clearly the superior choice for HPC tasks. Keep in mind however that 1) even in HPC, it can become a question of perf/$ where AMD might still be able to make a niche for itself and 2) while Zen is weak in HPC due to this reason, this performance is still good for Desktop and Server markets.
Intel once again mentions the bandwidth and memory channel issues with Zen.
Intel also points out that because of the fact that a 2S Intel configuration will have just 2 CPUs and 2 NUMA domains whileas AMD’s Naples (EPYC) processors will have 8 different NUMA domains. This will allegeldy make it difficult to get optimal application performance.
Intel also states that according to their estimates, Naples will get a base frequency of roughly 2.2 GHz) with total chip TDP of 180W.
Intel also talks about the famous Zen CCX. The 4-core CPU Complex (fondly referred to as the CCX) is the heart of Zen building blocks. They point out that two Zen CCXs are connected togather on the die via an intra die interface and accessing L3 cache due to this reason will further increase latency.
To support this claim, Intel has actually crunched the numbers for comparison purposes. According to their calculations, Intel has a 73% lower latency while accessing the L3 cache and up to a 31% lower latency while accessing DDR RAM.
Similarly, the memory bandwidth with the latest generation Xeon processors is 128 GB/s with PCIe bandwidth being 96 GB/s whereas its roughly 50 GB/s for AMD Naples with PCIe bandwidth being 64 GB/s per die. This means that the Zen architecture cannot support the full PCIe bandwidth.
We also have a slide which will be pretty useful for cloud hosters that are in the business of giving dedicated threads to clients. In this SMT On/Off comparison, we see that Intel has some pretty decent gains with SMT enabled across the board, while AMD’s EPYC (Naples) gains are inconsistent and in the case of server side Java can actually be negative. Both processors had 8 cores and were clocked at 2.2 GHz clock speeds to make the comparison valid.
Intel also states that there is no VM interoperability between AMD and Intel architctures. While that in itself is not a good thing for either Intel or AMD, the thing is that Intel has older generation architectures which are very competitive and clients can migrate live VMs on those while as AMD has no such previous gen cushion.These next few slides illustrate the scaling of Virtual Machines in both architectures. In Intel’s case, clients will get smooth and linear scaling up to 10 cores whileas you will get consistent VM scaling for only 4 cores in AMD Naples. Anything above that will lead to high latencies (7x as much in some cases) as more and more Zeppelin dies are involved. This is something that once again can be an issue for cloud hosting providers which provide VM as a service to customers.
Finally, we have something that is the bread and butter of the Intel advantage: a very well developed ecosystem. This is something that becomes true for any player that has been operating consistently in the market for a long time and is certainly true for Intel. The company does have a very robust ecosystem while Naples (EPYC) being the newcomer (or rather a comeback) does not currently have a similar characteristic.