⋮    ⋮  

Intel Skylake-X and Skylake-SP Mesh Architecture For XCC “Extreme Core Count” CPUs Detailed – Features Higher Efficiency, Higher Bandwidth and Lower Latency


Intel has shared new details on their upcoming Skylake-SP and Skylake-X series of processors that will be available in the coming months. The details shared reveal that the HEDT Skylake-X and the Xeon class Skylake-SP chips, which are sharing the same architecture, will feature the biggest CPU design change in more than 9 years.

Intel Skylake-X and Skylake-SP Feature Massive Architecture Upgrade - Mesh Topology Replaces Ring Bus For Higher Bandwidth, Low Power and Lower Latency

When Nehalem launched back in 2008, Intel introduced their new (at that time) Ring Bus design for processors. The Ring Bus design was designed for up to 8 core Xeon processors. The Ring bus inter connect worked in a bi-directional and sequential path and moved through the most important of elements inside the processor itself. These included cores, integrated memory controllers, caches, PCI Express, I/O controllers, etc.

Intel To Disclose New Information on 14th Gen Meteor Lake & 15th Gen Arrow Lake CPUs at HotChips 34

Now Ring Bus was a good and simple interconnect until Intel started producing really massive core count processors. The Intel Xeon E5-2699 V4 which is the flagship Broadwell-EP Xeon (E5) has total of 22 cores. This is a big upgrade from just 8 cores on Nehalem. The problem is that a single ring-bus cannot accommodate all the cores and as such, two rings are deployed inside the die. Each die also needs to communicate with one another to see if they have to pass on data from the second ring to the first. This introduced a set of rings in the bus which offer bi-direction information / data transfer between the two sets of cores.

While the two sets of cores are close and inter connected via the ring bus topology, the structure is very dense and that means that whenever that data needs to traverse between either part of the processor, the chip requires to hop through one ring to another. This increases the cycle time and that in return adds latency and ends up bandwidth starved.

So it looks like Ring Bus while great for low core count chips, isn't as useful for higher core count processors which are the future. There is also a demand for more I/O, PCI-Express capabilities on next-generation Xeon and HEDT class CPU and Ring Bus just couldn't accommodate them without ending up with higher latencies and lower efficiency.

Intel Arc Graphics Cards Get ‘VRAM Self-Refresh’ Feature In Latest Linux Drivers

Introducing The Mesh Architecture - Intel's Answer To AMD's Infinity Fabric Interconnect

The introduction of Intel's Mesh architecture for Skylake-SP and Skylake-X processors can be seen as a direct response to AMD's Infinity Fabric. But we need to get some things straight as AMD had first announced Infinity Fabric a few months before they launched Ryzen processors. That was in early 2017. Design changes for architectures aren't made in a mere months, they take a good sum of time and in the case of Mesh, we are talking about 2-3 years. Intel replaced their Ring Bus inter connect which was launched back in 2008 after 9 years and that means a lot of engineering work was required. And Intel themselves found a need to update an outdated technology which was not going to work for them in the long run.

As Skylake has been overall an architectural lift on the mainstream front, this is the upgrade they are going to add to the HEDT and Server front which utilizes multiple core count. The Mesh is built for XCC dies or Extreme Core Count processors which will start shipping later this month for HEDT platforms and on a later date with the Xeon lineup.

So talking about the Mesh, it is made up of several vertical and horizontal connections which are connected to cores, cache, memory controllers and PCI Express. There's also the Inter-Socket Link which replaces QPI (Quick Patch Interconnect). With Mesh, Intel has simply multiplied the amount of on-die communication channels that not only increases bandwidth but also delivers low latency and has decreased complexity in terms of design compared to Ring Bus. The Mesh can traverse through the die using multiple paths and lower hops / cycles compared to ring bus. This allows the chip to use lower clock rates and lower voltage while delivering lower latency and increased band width.

The Intel XCC "Extreme Core Count" Die uses the mesh architecture to support up to 18 cores. Note that there are 20 cores n the die, 2 cores have been disabled.

This also lowers the cost and adds more efficiency to the chip. It will also help boosting latency times and bandwidth speeds on the most densest core design (Up To 28 cores on Skylake-SP) without breaking a sweat. You can see in the die shot posted above that Intel has IMC (Integrated Memory Controllers) on the left and right sides of the die. There are three in total for each side that confirms that this is a cut down die as Xeon parts are designed to support 6 channel memory. Skylake-X would only be able to utilize quad channel memory.

Furthermore, Mesh have two PCI Express stops compared to just one on Ring. This means that Mesh will offer more bandwidth, lower latency in tasks that put high load on the PCI Express lanes such as multiple discrete graphics cards, fast NVMe SSDs or 100 Gbps networks.

Overall, Mesh architecture sounds and looks like a big step forward for Intel to increase the scalability of their architecture. It remains to be seen how much performance impact does Mesh offer compared to Ring Bus but we will find out soon enough as Skylake-X processors are going to hit store shelves in a couple of weeks and NDA lifts shortly soon.