NETFLIX Evaluating Replacing Intel With AMD EPYC Processors, Single EPYC Compared To Dual Socket Xeon
Netflix, a household name in the online streaming industry appears to be evaluating AMD EPYC based solutions for its data centers as a possible replacement for Intel Xeon based solutions. In a presentation by Drew Gallatin at EuroBSDCon (thanks for the link @DavidSchor) the company went over achieving an ambitious 200 Gbps goal for its streaming hardware and considering Intel and AMD based solutions that could be used to achieve this. It is worth adding here that this is not a confirmation of Netflix shifting to AMD - just an early evaluation by the company but one that could possibly lead to a shift.
Netflix assesses an AMD EPYC based solution to achieve 200 Gbps
The story begins like this. Netflix has server setups right now that can achieve a goal of 100 Gbps quite easily, but with expansion plans in mind, the company is considering what it has to do to achieve a goal of 200 Gbps per commodity server. The current setup Netflix is using consists of a single Intel Xeon based solution and considering it has to double the performance, the company can either throw another Xeon socket into the equation or go with a single EPYC part. Since both the EPYC part and Intel Xeon parts have similar TCOs (Total Cost of Ownership), this is essentially an exercise in meticulous technical evaluation.
The setup that Netflix is using right now is a mix of Broadwell and Skyalake/Cascade Lake Xeons. The Broadwell-based Xeons have 60GB/s worth of memory bandwidth and 40 PCIe Gen3 lanes (which is 32 GB/s of IO bandwidth) while the Intel Skylake/Cascade Lake Xeons have 90 GB/s memory bandwidth and 48 lanes of PCIe Gen3 (which is 38 GB/s of IO bandwidth). Neither is close to the goal of Netflix's 200 Gbps ambition so these are the two choices that the company has going forward (note that AMD was not part of the equation the first time around):
On the Intel side, they can go with a dual-Xeon configuration with 2x Intel Xeon Silver 4116/4216 processors. These would have a total of 180 GB/s memory bandwidth and 96 PCIe Gen3 lanes (for a total of 75 GB/s IO bandwidth). The dual Xeons would be connected by 2 UPI links.
On the other hand, they can go with an AMD EPYC Naples/Rome solution consisting of either the 7551 or 7502P (more likely). Infinity fabric would be connecting the four chiplets inside the EPYC part and the company would have access to a memory bandwidth of 120-150 GBps. This AMD setup would have access to128 lanes of PCIe Gen3 (Gen4 for the 7502P with the added advantage of memory bandwidth doubling to 200 GB/s).
The presentation then goes into detail on how to optimize the NUMA configuration, reducing data passing through the high latency fabric, which is frankly beyond the scope of this article (and any enthusiasts are welcome to read through the entire thing at the link given at the start of the article) but the end results are as follows:
The Xeon-based solution can achieve a maximum throughput of 191 Gbps while the EPYC configuration can reach a maximum throughput of 194 Gbps. Considering both parts chosen have similar TCO (note that this is true even though you are comparing a dual-socket configuration to a single socket one) this makes it clear that Netflix can use either Intel or AMD for their upgrade going forward - with a slight performance advantage going to AMD.
What it will all boil down to, however, is how the switching costs come into play. Since AMD is currently offering better performance for the same TCO, the only thing Netflix has to worry about is the cost associated with shifting from an Intel ecosystem to an AMD one. Considering AMD's latest generation EPYC parts are shipping with PCIe 4, we feel that the company has a distinct upper hand when it comes to tech and we wouldn't be surprised if Netflix decides to make the shift eventually. Needless to say, Intel should start paying attention to Netflix's future plans post-haste if they want to retain this big client.