Intel Xeon Phi ‘Knights Hill’ Will Debut on the 10nm Process – 2nd Generation Omni-Path and HMC


So something, really interesting came up. Intel has started talking in earnest about its 10nm node and one of the more solid roadmaps is already out. Xeon Phi is blue's race horse in the HPC sector, where it is competing with strong GPGPU rivals, such as Nvidia, to get a share of the number crunching pie. Intel's Co-Processors can do everything a professional GPU can (and more) with one critical advantage: they don't need a CPU to drive them.

Xeon Phi Knights LandingSlide Credit URL - Anandtech.com

Intel's 10nm, 3rd Generation, Knight's Hill Co-Processors to update Xeon Phi lineup in 2017+

Nvidia has recently unveiled the Tesla K80 which features the GK210 GPU (something we told you about a very long time ago) and I believe Intel was compelled by this announcement that it announced its HPC plans. Knight's Hill constitutes the third iteration in the Xeon Phi family and is preceded by Knights Landing and Knights Corner. Interestingly, Knight's Corner only featured the GPU form factor, which Intel adopted because of its convenient design but was something that had a big footprint. Knight's Landing innovated into the familiar Socket (CPU) Packaging and offered businesses the option to choose whichever design they preferred.

From early reports, Intel was planning on abandoning the GPU form factor completely but I am not sure how accurate that is anymore. From what the slide is implying, I expect both form factors to stick around in the future. Keep in mind, however, that Knight's Landing isn't available yet (to my knowledge). Most of the lineup is expected to land sometime in 2015, but as is the case with the HPC sector, concrete roadmaps are provided in advance for investors and bushiness to make critical decisions. Also, unlike the mainstream sector, any delay in a roadmap could cost Intel a lot of money, so I expect the roadmap will hold true to the letter.

Lets talk a bit about Knight's Landing. It was built on the 14nm Process and used a modified silvermont core (x86 ofcourse). It is also one of the first mass produced components that feature stacked DRAM. Since the GPU form factor is usually limited (to heck) by the PCI-E lane, the CPU form factor provides super-low latency and almost no bottlenecks. The on-packed stacked DRAM will come in a whooping 16 GB while there are connections for additional DDR42400Mhz memory.  That is a massive massive amount of memory to have onboard. However there is a technical catch (note the keyword technical). The HMC will not actually be placed or stacked upon the die.

intel_ xeon_phi_coprocessor_1Intel will actually be surrounding the Xeon Phi die using a Micron-Intel custom made, super-high bandwidth, parallel path interface that will make the HMC appear as if its on the die. Infact it will act more or less like an L3 cache worth 16GB. You can expect this size to increase with Knight's Hill. Ofcourse if you strip away the marketing material you would realize that an L3 cache is faster than the currently known speeds of HMC (depending on which processor you have). The Hybrid Memory Cube used in the Knights Landing Xeon Phi package will feature upto 2000 TSVs and an ASIC at the base of the HMC to manage the DRAM package. It promises more than 5 times the bandwidth of DDR4 RAM and more than 15 times the bandwidth of DDR3 Ram. Because Intel is using a customized Micron 16GB HMC solution (they already have 2GB and 4GB variants) and a customized interface the bandwidth will be 500GB/s. The Omnit-Path Interconnect can provide line speeds of upto 100Gbps.

The Xeon Phi platform is currently pretty much in its infancy and scaling between generations very GPU-like. Based on the projections so far I would expect the new 10nm Knight's Hill Co-Processors to have performance easily upwards of 5 TFlOPS. I expect rival GPUs in 2017 to easily beat this number, but the inherent disadvantage of the PCI-E Bottleneck and the compulsory CPU dependency makes Xeon Phi a very very viable alternative for a supercomputing cluster.