NVIDIA NVLINK and Future of HPC Oriented GPUs
The Pascal GPU would also introduce NVLINK which is the next generation Unified Virtual Memory link with Gen 2.0 Cache coherency features and 5 – 12 times the bandwidth of a regular PCIe connection. This will solve many of the bandwidth issues that high performance GPUs currently face. One of the latest things we learned about NVLINK is that it will allow several GPUs to be connected in parallel in HPC focused platforms that will feature several nodes fitted with Pascal GPUs for compute oriented workloads. The latest NVLINK interconnect path will allow multi-processors featured inside HPC blocks to have faster interconnect than traditional PCI-e Gen3 lanes up to 200 GB/s speeds. Pascal GPUs will also feature Unified memory support allowing the CPU and GPU to share the same memory pool and finally we have Mixed precision support. While NVLINK isn't planned for commercial integration right now, it will be featured in PCs using ARM64 chips and some x86 powered HPC servers that utilize from OpenPower, Tyan and Quantum solutions.
Outpacing PCI Express
Today a typical system has one or more GPUs connected to a CPU using PCI Express. Even at the fastest PCIe 3.0 speeds (8 Giga-transfers per second per lane) and with the widest supported links (16 lanes) the bandwidth provided over this link pales in comparison to the bandwidth available between the CPU and its system memory. In a multi-GPU system, the problem is compounded if a PCIe switch is used. With a switch, the limited PCIe bandwidth to the CPU memory is shared between the GPUs. The resource contention gets even worse when peer-to-peer GPU traffic is factored in.

NVLink addresses this problem by providing a more energy-efficient, high-bandwidth path between the GPU and the CPU at data rates 5 to 12 times that of the current PCIe Gen3. NVLink will provide between 80 and 200 GB/s of bandwidth, allowing the GPU full-bandwidth access to the CPU’s memory system.
A Flexible and Energy-Efficient Interconnect
The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Our Pascal GPUs will support a number of these links, providing configuration flexibility. The links can be ganged together to form a single GPU↔CPU connection or used individually to create a network of GPU↔CPU and GPU↔GPU connections allowing for fast, efficient data sharing between the compute elements.

When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections enabling previously unavailable opportunities for GPU clustering.

Moving data takes energy, which is why we are focusing on making NVLink a very energy efficient interconnect. NVLink is more than twice as efficient as a PCIe 3.0 connection, balancing connectivity and energy efficiency.
Understanding the value of the current ecosystem, in an NVLink-enabled system, CPU-initiated transactions such as control and configuration are still directed over a PCIe connection, while any GPU-initiated transactions use NVLink. This allows us to preserve the PCIe programming model while presenting a huge upside in connection bandwidth.

The NVIDIA Pascal GPU will be a major update as it will probably turn out to be the first family of GPUs to utilize from HBM2 and the latest 16nm FinFET process. Next year, AMD plans to launch their Arctic Islands family too with an insane transistor count that's rumored around 17-18 billion utilizing the same HBM2 memory and new process node. The NVIDIA Pascal GPU will be featured inside top of the line servers and workstation while Volta, the GPU after it will be featured inside two next generation super computers codenamed Sierra and Summit, reaching over 300 Peta Flops of compute performance. If you thought the Radeon R9 Fury X and the GeForce GTX 980 Ti were beastly cards, than you should be prepared to see the monstrous amount of performance that next generation GPUs are going to offer.
NVIDIA Pascal GPU Prototype Board:
| GPU Family | AMD Vega | AMD Navi | NVIDIA Pascal | NVIDIA Volta |
|---|---|---|---|---|
| Flagship GPU | Vega 10 | Navi 10 | NVIDIA GP100 | NVIDIA GV100 |
| GPU Process | 14nm FinFET | 7nm FinFET | TSMC 16nm FinFET | TSMC 12nm FinFET |
| GPU Transistors | 15-18 Billion | TBC | 15.3 Billion | 21.1 Billion |
| GPU Cores (Max) | 4096 SPs | TBC | 3840 CUDA Cores | 5376 CUDA Cores |
| Peak FP32 Compute | 13.0 TFLOPs | TBC | 12.0 TFLOPs | >15.0 TFLOPs (Full Die) |
| Peak FP16 Compute | 25.0 TFLOPs | TBC | 24.0 TFLOPs | 120 Tensor TFLOPs |
| VRAM | 16 GB HBM2 | TBC | 16 GB HBM2 | 16 GB HBM2 |
| Memory (Consumer Cards) | HBM2 | HBM3 | GDDR5X | GDDR6 |
| Memory (Dual-Chip Professional/ HPC) | HBM2 | HBM3 | HBM2 | HBM2 |
| HBM2 Bandwidth | 484 GB/s (Frontier Edition) | >1 TB/s? | 732 GB/s (Peak) | 900 GB/s |
| Graphics Architecture | Next Compute Unit (Vega) | Next Compute Unit (Navi) | 5th Gen Pascal CUDA | 6th Gen Volta CUDA |
| Successor of (GPU) | Radeon RX 500 Series | Radeon RX 600 Series | GM200 (Maxwell) | GP100 (Pascal) |
| Launch | 2017 | 2019 | 2016 | 2017 |
Contents
Follow Wccftech on Google to get more of our news coverage in your feeds.












