Intel’s 10nm Knights Hill Powered Aurora Supercomputer To Feature Up To 180 PetaFlops Computational Power – 2018 Launch Scheduled

Hassan Mujtaba
Posted Apr 13, 2015
57Shares
Share Tweet Submit

Intel is working in-cooperation with Cray to deliver two supercomputers for Argonne National Laboratory, named as Aurora and Theta. Both supercomputers advance the hyper computing world with unparalleled amounts of performance, pushing past the PetaFlops barrier with several thousand nodes. Built specifically for scientific purposes, these supercomputers will consume power in the Mega Watts range which gives us a perspective of the size of these monsters.

Intel Powering 180 PFlops Aurora Supercomputer With Next Generation Xeon Phi Coprocessors

Intel’s 180 PFLOPs Aurora Supercomputer

The flagship supercomputer, Aurora, is scheduled for launch in 2018 and will make use of over 50,000 nodes with a energy consumption of 13MW. The two main objectives of the supercomputers built in collaboration with Intel and Cray is to accelerate discovery and innovation. Compared to their last supercomputer, MIRA, Aurora would feature 18 times better performance (180 PFLOPS vs 10 PFLOPS), would be more than 6 times more power efficient and consume just 2.7 times more power (13MW vs 3.9 MW). All of this massive horse power would be achieved with Intel’s next generation Xeon Phi accelerators codenamed “Knights Hill”.

The Knights Hill family of Xeon Phi accelerators or co-processors as they are termed would be based on the 10nm node and feature the second generation Intel Omni-path architecture. These HPC accelerators will have around 90-100 10nm cores repurposed for compute environments as Intel did with their Silvermont cores on the Knights Landing accelerators.  Being built on 10nm, Intel would keep the stock clocks running at 1.2 – 1.5 GHz range and would focus on better integer performance. It is highly likely that Xeon Phi Knights Hill will be pumping 4-5 TFlops of double precision floating point performance. The latest Omni-path fabric is going to run at around 200 GB/s as theplatform sums things up. This $200 million deal will end up crunching 180 PetaFlops of compute performance with the ability to upgrade up to 450 PetaFlops which is sheer insane amount of performance.

The Theata Supercomputer – 8.5 PFlops With Knights Landing in 2016

The second supercomputer signed up is the Theta which arrives earlier in 2016 and is also based on an Intel Xeon Phi coprocessor, Knights landing. We have been talking about Knights landing a lot recently as more details were revealed by Intel. The 14nm Silvermont powered accelerator drives up to 3 TFlops of compute performance with its massive array of 72 cores with 288 threads. The design on the chip is separated into several tiles which is a partition dedicated to two such cores, each featuring 32 KB + 32 KB L1 cache (Instruction/Data) and a pair of custom 512-bit AVX vector units that adopts the same instruction set as featured on Xeon chips. This puts the total number of AVX units to 120 on the top end Xeon Phi accelerator. Unlike the regular Silvermont core, the new Knights core are repurposed to deliver better x86 performance that is on-par to a proper core. Each tile is configured along a shared L2 cache which weighs at 1 MB and adds up to 30 MB of L2 cache. The chips further has two independent DDR4 memory controllers that allow 6-channel (3 channel per controller) memory support  that allows up to 384 GB of RAM to be supported by the complete platform and furthermore a separate memory controller for on-package memory which will be detailed in a bit.

Intel & ARM Announce Historic Licensing Deal; Santa Clara To Also Manufacture 10nm Chips For LG

The On-Die things that Intel have stirred up with the latest Knights Landing are quite interesting, with the integrated Omni-Path which provides fast interconnect along with an I/O controller that provides up to 36 PCI-E 3.0 lanes, Intel has managed to put 8 High-Bandwidth memory banks on the package which is the reason for its massive size. The reason behind this is to deliver fast memory access that is close to the die itself rather than system memory. This high-performance memory is not to be associated with either HBM (High-Bandwidth memory) or HMC (Hybrid Memory Cube). In fact, the memory is created by Intel is collaboration with memory creator, Micron Technology and is known as MCDRAM which is a variant of the Hybrid Memory Cube design. The top variant of the Xeon Phi SKU will feature up to 16 GB of highly fast memory that will deliver up to 400 GB/s memory bandwidth in addition to the 90 GB/s bandwidth that is pumped just by the DDR4 system ram alone.

This supercomputer will feature 8.5 TFlops of compute performance and consume 1.7MW of energy. It can be seen that this supercomputer is of a more smaller size compared to Aurora and will make use of several lesser nodes to achieve the desired performance.

Moving To Exascale Computing, Slow and Steadily

It can be seen that supercomputers will be passing the 500 PFlops scale by 2018-2020 and that is when we should start expecting the first EFlops (ExaFlops) supercomputers. The road is lined up for several of these devices as Intel, AMD, NVIDIA, IBM push for more high-performance designs with their next generation HPC accelerators. NVIDIA is also working on at least two supercomputers with IBM for Oak Ridge National Labs and Lawrence Livermore National Labs. Arriving in 2017-2018, these supercomputers are codenamed Summit and Sierra. These next generation supercomputers will also feature 150-300 PetaFlops of compute performance powered by IBM Power9 CPUs, NVIDIA Volta GPUs and NVIDIA’s latest NVLINK interconnect which will establish a link between 3400 nodes, each capable of delivering over 40 TFLops performance.

2016-2018 Supercomputers Comparison:

Aurora Supercomputer Theta Supercomputer Summit Supercomputer Sierra Supercomputer
CPU Architecture   Intel Xeon  Intel Xeon  IBM Power9  IBM Power9
HPC Accelerators  Intel Xeon Phi “Knights Hill” Intel Xeon Phi “Knights Landing”  NVIDIA Volta “Tesla”  NVIDIA Volta “Tesla”
Nodes  50,000  TBA  3,400 TBA
Energy Consumption  13MW 1.7MW ~10MW TBA
Performance  180-450 PFLOPS 8.5 PFLOPS 150-300 PFLOPs  <100 PFLOPS
Vendor   Intel / Cray   Intel / Cray  NVIDIA / IBM  NVIDIA / IBM
Interconnect  Intel 2nd Gen Omni Path Intel 1st Gen Omni Path NVIDIA NVLINK  NVIDIA NVLINK
Laboratory  Argonne Argonne Oak Ridge Lawrence Livermore
Delivery Year  2018 2016 2017 2017

Share Tweet Submit