AMD Zen Based Naples CPU For High-Performance Servers Detailed – 32 Cores, 64 Threads and 128 PCIe Gen 3.0 Lanes on 1U and 2U Racks
We have some new details on AMD's Zen based Naples server platform which will be available in 1H 2017. The new platform will be feature rich for the HPC market with parts that aim to maximize compute performance.
AMD Zen Based Naples Platform For High-Performance Computing Detailed - 128 PCI Express 3.0 Lanes
The latest details show that AMD will be offering an optimized GPU server platform with Naples CPU. The AMD Naples platform will feature a maximum of 128 PCI Express Gen 3.0 lanes. The huge number of lanes will allow support for a vast amount of devices. The detailed block diagram shows that a 1U rack will be able to support up to 32 NVMe devices and four discrete GPUs. The 1U rack will feature two InfiniBand EDR interconnects for data communication between storage and server systems.
The 1U rack is termed "Maximize Compute Density" so we may probably be looking at the denser Naples configuration. The one with 32 cores and 64 threads can be expected here. The 2U rack is termed "Maximize Performance/Node". This could mean that we are looking at a different or same configuration of Naples. But, the 2U rack allows for more GPU support. The 2U rack can support up to 8 discrete graphics cards, 26 NVMe devices and a single IB EDR link.
AMD has listed down some advantages of both 1U and 2U racks. These include 4-6 directly attached discrete GPUs. RTG is actively working on delivering Vega based products around the same time as Naples so we can see some of them coupled to Naples processors. The interconnect provides P2P communication between the GPUs. There's also no PCIe 3.0 switch required which effectively reduces cost, latency while delivering full bandwidth between GPU and CPU.
AMD Naples Server Platform Uses 8 Channel Memory System
The Zen based Naples server platform will be featuring 8 channels for main system memory. This would allow for denser memory capacities on the racks. Such racks will be fully optimized within several workloads for HPC environments such as molecular dynamics, rendering, graphics and data analytics, DNN and O&G.
In another slide, AMD has listed several validated platforms which they are working with OEMs to bring to the market when Naples launches.These range from 1U (2 CPU / 4 GPU), 2U (2 CPU / 8 GPU), Sled: 2P (Multi CPU / 1 or 2 GPU) and Blade: 2P (Multi CPU / 1 to 3 GPU) systems. AMD will also be offering server cards in various form factors. The add-in-cards will be available in the standard PCI Express form factor while OEM-only will include MCM and MXM modules.
Upcoming Intel and AMD Server Platform Comparison:
|Intel Xeon E5 Bronze / Silver||Intel Xeon E7 Gold / Platinum||AMD Naples Platform (2P)|
|Family Branding||Skylake-SP||Skylake-SP||AMD EPYC|
|PCH||Lewisburg PCH||Lewisburg PCH||SOC|
|Socket||Socket P (LGA 3647)||Socket P (LGA 3647)||SP3 LGA socket|
|Max Core Count||Up To 26||Up To 32||Up To 32|
|Max Thread Count||Up To 52||Up To 64||Up To 64|
|Max L3 Cache||35.75 MB L3||38.5 MB L3||64 MB L3|
|DDR4 Memory Support||6-Channel DDR4||6-Channel DDR4||8-Channel DDR4|
Naples 32 Core, 64 Threaded Behemoth That Will Battle Intel's Xeon in The HPC Department
AMD is creating Naples to challenge Intel in a space they have long dominated with their Xeon lineup. This hot battle will take place in the server market with AMD Naples at one end and Intel's Xeon on the other. AMD has several Zen Naples SKUs ranging from 16 cores and up to 32 core variants. Each of them is multi-threaded so performance is going to be top-notch on this part. Along with performance, 14nm FinFET would also prove a great process for efficiency optimizations.
|WCCFtech||AMD Naples||AMD Summit Ridge|
|L1 Instruction Cache||32 KB x 32||32 KB x 8|
|L1 Data Cache||64 KB x 32||64 KB x 8|
|L2 Cache||512 KB x 32||512 KB x 8|
|L3 Cache||64 MB||16 MB|
|Base Clock||1.4Ghz||Up To 3.6 GHz|
|Turbo Clock||2.8Ghz||Up To 4.0 GHz|
The Zen core comprises of a CPU complex. Each CPU complex features four cores that are connected to 8 MB of L3 cache. Zen based server chips with 32 cores can feature up to 64 MB of L3 cache which is quite disruptive. This along with 64 threads available for hyper-threaded workloads would make for a very competitive product in server platforms. For full details on the Zen microarchitecture, you can check out a detailed article here.
AMD Next Generation Vega 10, 11, 20 and Dual GPU Graphics Card Rumored Lineup:
|WCCFTech||Polaris 10||Vega 11||Vega 10||Vega 10 Dual GPU||Vega 20|
|Process||14nm FinFET||14nm FinFET||14nm FinFET||14nm FinFET||7nm FinFET|
|Transistors In Billions||5.7||TBA||TBA||TBA||TBA|
|Stream Processors||2304||2304+ (est.)||4096||8192||4096|
|Clock Speed||1266 Mhz||TBA||1526 Mhz||1100 Mhz+ (est.)||1800 Mhz+ (est.)|
|Performance||5.8 TFLOPS||TBA||12.5 TFLOPS||19 TFLOPS - 24 TFLOPs (est.)||15 TFLOPS+|
|Memory||8GB GDDR5||TBA||8GB/16GB HBM2||16-32GB HBM2||16-32GB HBM2|
|Memory Bus||256bit||TBA||2048-bit (2 Stacks)||4096-bit (2048-bit x2)||4096-bit (4 Stacks)|
|Bandwidth||256 GB/s||TBA||512 GB/s||1 TB/s||1 TB/s|
A 2 socket solution should mean a total of 64 cores and 128 threads along with denser memory capacities that make Opteron sound like a kid in the park. We would also see next generation HPC server chips which combine 32 high-performance cores alongside massive Vega GPUs that will be used to crunch FP64 calculations. That has been in the plans for quite some time and with Naples rolling out in Q2 2017, we might see it sometime around 2018 which is the same time as Vega 20 with FP64 compute arrives on the market.