Nvidia has confirmed that its “Pascal” GPU architecture will launch in 2016 and that its “Volta” GPU architecture succeed it in 2018. While supercomputers based on these architecture are expected to be operational in 2017. Volta represents Nvidia’s sixth generation of General Purpose GPU architectures since the introduction of the company’s first unified shader graphics architecture code named Tesla. Which debuted with the company’s highly successful GeForce 8 – 8000 series – back in 2006.
Volta was originally intended to succeed Nvidia’s 900 series Maxwell GPU architecture in 2016. It was originally going to be the company’s first generation to feature stacked memory. However Volta was designed with HMC , the Hybrid Memory Cube, in mind. Unfortunately however HMC hadn’t matured as quickly as Nvidia had hoped. So a replacement was put in place that makes use of the other major stacked memory standard, High Bandwidth Memory or HBM for short. And thus Pascal was born.
hat we know so far about Nvidia’s flagship Pascal GP100 GPU :
- Pascal graphics architecture.
- 2x performance per watt estimated improvement over Maxwell.
- To launch in 2016, purportedly the second half of the year.
- DirectX 12 feature level 12_1 or higher.
- Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
- Built on the 16nm FinFET manufacturing process from TSMC.
- Allegedly has a total of 17 billion transistors, more than twice that of GM200.
- Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM and 8-Hi stacks for up to 32GB for the professional compute SKUs.
- Features a 4096-bit memory bus interface, same as AMD’s Fiji GPU power the Fury series.
- Features NVLink (only compatible with next generation IBM PowerPC server processors)
- Supports half precision FP16 compute at twice the rate of full precision FP32.
|GPU Architecture||NVIDIA Fermi||NVIDIA Kepler||NVIDIA Maxwell||NVIDIA Pascal|
|GPU Process||40nm||28nm||28nm||16nm (TSMC FinFET)|
|GPU Design||SM (Streaming Multiprocessor)||SMX (Streaming Multiprocessor)||SMM (Streaming Multiprocessor Maxwell)||SMP (Streaming Multiprocessor Pascal)|
|Maximum Transistors||3.00 Billion||7.08 Billion||8.00 Billion||15.3 Billion|
|Maximum Die Size||520mm2||561mm2||601mm2||610mm2|
|Stream Processors Per Compute Unit||32 SPs||192 SPs||128 SPs||64 SPs|
|Maximum CUDA Cores||512 CCs (16 CUs)||2880 CCs (15 CUs)||3072 CCs (24 CUs)||3840 CCs (60 CUs)|
|FP32 Compute||1.33 TFLOPs(Tesla)||5.10 TFLOPs (Tesla)||6.10 TFLOPs (Tesla)||~12 TFLOPs (Tesla)|
|FP64 Compute||0.66 TFLOPs (Tesla)||1.43 TFLOPs (Tesla)||0.20 TFLOPs (Tesla)||5.5 TFLOPs(Tesla)|
|Maximum VRAM||1.5 GB GDDR5||6 GB GDDR5||12 GB GDDR5||16 / 32 GB HBM2|
|Maximum Bandwidth||192 GB/s||336 GB/s||336 GB/s||1 TB/s|
|Launch Year||2010 (GTX 580)||2014 (GTX Titan Black)||2015 (GTX Titan X)||2016|
Nvidia Confirms Pascal Launching In 2016, Volta In 2018
While admittedly HMC has shown much slower progress compared to HBM which is already being used in AMD’s latest GPU code named Fiji, HMC still offers some substantial benefits for the server and HPC market. And that’s where Volta is set to shine.
Nvidia plans to introduce Volta in a range of consumer graphics cards by 2018 and to use Volta GPUs to power some really exciting and highly power efficient next generation supercomputers.
In 2017, we will be looking forward to two new supercomputers, the Summit from Oak Ridge National Laboratory and Sierra from Lawrence Livermore National Laboratory. Now both of these supercomputers have one thing in common, both of them will feature several next generation IBM POWER9 CPUs and also several NVIDIA Volta GPUs.
Summit is rated at a peak performance of 150-300 PFLOPS and this will be delivered through more than 3400 compute nodes. Each node powered by several next generation IBM POWER9 CPUs and NVIDIA Volta based Tesla accelerators. Each node will deliver around 40 teraflops of compute and is touted as a more performent solution than an entire rack of flagship Haswell based server chips.
There’s one technology that will be pivotal to delivering the promise of Volta GPGPUs in servers and supercomputers, and that’s NVLINK. This technology is aimed at GPU accelerated servers and supercomputers where the inter-chip communication is extremely bandwidth limited and a major system bottleneck. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs.
NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers.
VOLTA GPU Featuring NVLINK and Stacked Memory NVLINK GPU high speed interconnect 80-200 GB/s 3D Stacked Memory 4x Higher Bandwidth (~1 TB/s) 3x Larger Capacity 4x More Energy Efficient per bit.
NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News
NVLink will debut with Nvidia’s Pascal in 2016 before it makes its way to Volta in 2018. And unlike Maxwell, Nvidia has laid major focus on compute and GPGPU acceleration with Pascal. The slew of features and new technologies that Nvidia will debut with Pascal emphasize this focus. Including the use of next generation stacked High Bandwidth Memory, high-speed NVLink GPU interconnect and support of mixed precision for the acceleration of mobile applications to push on mobile perf/watt. We expect that Volta will carry all of these forward.
Back to the Summit supercomputer, perhaps most impressive thing about it is that it will consume 10% more power than the Titan supercomputer and in exchange deliver up to 10 times the computational performance. While Titan is rated at 25-30 PETAFLOPs, Sierra will be deliver >100 PFlops of compute and Summit will deliver an even more impressive 150-300 PFlops of compute.