NVIDIA Updates Pascal GPU Board – Four HBM2 Stacks and Massive Die Previewed Ahead of Launch in 2016, 200 GB/s NVLINK Interconnect

Hassan Mujtaba • Oct 4, 2015 at 02:58am EDT

NVIDIA NVLINK and Future of HPC Oriented GPUs

The Pascal GPU would also introduce NVLINK which is the next generation Unified Virtual Memory link with Gen 2.0 Cache coherency features and 5 – 12 times the bandwidth of a regular PCIe connection. This will solve many of the bandwidth issues that high performance GPUs currently face. One of the latest things we learned about NVLINK is that it will allow several GPUs to be connected in parallel in HPC focused platforms that will feature several nodes fitted with Pascal GPUs for compute oriented workloads. The latest NVLINK interconnect path will allow multi-processors featured inside HPC blocks to have faster interconnect than traditional PCI-e Gen3 lanes up to 200 GB/s speeds. Pascal GPUs will also feature Unified memory support allowing the CPU and GPU to share the same memory pool and finally we have Mixed precision support. While NVLINK isn't planned for commercial integration right now, it will be featured in PCs using ARM64 chips and some x86 powered HPC servers that utilize from OpenPower, Tyan and Quantum solutions.

Outpacing PCI Express

Today a typical system has one or more GPUs connected to a CPU using PCI Express. Even at the fastest PCIe 3.0 speeds (8 Giga-transfers per second per lane) and with the widest supported links (16 lanes) the bandwidth provided over this link pales in comparison to the bandwidth available between the CPU and its system memory. In a multi-GPU system, the problem is compounded if a PCIe switch is used. With a switch, the limited PCIe bandwidth to the CPU memory is shared between the GPUs. The resource contention gets even worse when peer-to-peer GPU traffic is factored in.

NVLink addresses this problem by providing a more energy-efficient, high-bandwidth path between the GPU and the CPU at data rates 5 to 12 times that of the current PCIe Gen3. NVLink will provide between 80 and 200 GB/s of bandwidth, allowing the GPU full-bandwidth access to the CPU’s memory system.

A Flexible and Energy-Efficient Interconnect

The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Our Pascal GPUs will support a number of these links, providing configuration flexibility. The links can be ganged together to form a single GPU↔CPU connection or used individually to create a network of GPU↔CPU and GPU↔GPU connections allowing for fast, efficient data sharing between the compute elements.

When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections enabling previously unavailable opportunities for GPU clustering.

Moving data takes energy, which is why we are focusing on making NVLink a very energy efficient interconnect. NVLink is more than twice as efficient as a PCIe 3.0 connection, balancing connectivity and energy efficiency.

Understanding the value of the current ecosystem, in an NVLink-enabled system, CPU-initiated transactions such as control and configuration are still directed over a PCIe connection, while any GPU-initiated transactions use NVLink. This allows us to preserve the PCIe programming model while presenting a huge upside in connection bandwidth.

The NVIDIA Pascal GPU will be a major update as it will probably turn out to be the first family of GPUs to utilize from HBM2 and the latest 16nm FinFET process. Next year, AMD plans to launch their Arctic Islands family too with an insane transistor count that's rumored around 17-18 billion utilizing the same HBM2 memory and new process node. The NVIDIA Pascal GPU will be featured inside top of the line servers and workstation while Volta, the GPU after it will be featured inside two next generation super computers codenamed Sierra and Summit, reaching over 300 Peta Flops of compute performance. If you thought the Radeon R9 Fury X and the GeForce GTX 980 Ti were beastly cards, than you should be prepared to see the monstrous amount of performance that next generation GPUs are going to offer.

NVIDIA Pascal GPU Prototype Board:

GPU Family	AMD Vega	AMD Navi	NVIDIA Pascal	NVIDIA Volta
Flagship GPU	Vega 10	Navi 10	NVIDIA GP100	NVIDIA GV100
GPU Process	14nm FinFET	7nm FinFET	TSMC 16nm FinFET	TSMC 12nm FinFET
GPU Transistors	15-18 Billion	TBC	15.3 Billion	21.1 Billion
GPU Cores (Max)	4096 SPs	TBC	3840 CUDA Cores	5376 CUDA Cores
Peak FP32 Compute	13.0 TFLOPs	TBC	12.0 TFLOPs	>15.0 TFLOPs (Full Die)
Peak FP16 Compute	25.0 TFLOPs	TBC	24.0 TFLOPs	120 Tensor TFLOPs
VRAM	16 GB HBM2	TBC	16 GB HBM2	16 GB HBM2
Memory (Consumer Cards)	HBM2	HBM3	GDDR5X	GDDR6
Memory (Dual-Chip Professional/ HPC)	HBM2	HBM3	HBM2	HBM2
HBM2 Bandwidth	484 GB/s (Frontier Edition)	>1 TB/s?	732 GB/s (Peak)	900 GB/s
Graphics Architecture	Next Compute Unit (Vega)	Next Compute Unit (Navi)	5th Gen Pascal CUDA	6th Gen Volta CUDA
Successor of (GPU)	Radeon RX 500 Series	Radeon RX 600 Series	GM200 (Maxwell)	GP100 (Pascal)
Launch	2017	2019	2016	2017

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Updates Pascal GPU Board – Four HBM2 Stacks and Massive Die Previewed Ahead of Launch in 2016, 200 GB/s NVLINK Interconnect

NVIDIA Updates Pascal GPU Board – Four HBM2 Stacks and Massive Die Previewed Ahead of Launch in 2016, 200 GB/s NVLINK Interconnect

NVIDIA NVLINK and Future of HPC Oriented GPUs

Outpacing PCI Express

A Flexible and Energy-Efficient Interconnect

NVIDIA Pascal GPU Prototype Board:

Contents

Trending Stories

Ubisoft Insider Rogue Says Splinter Cell Preserves Its Identity While Far Cry 7 Gambles Everything on a Radical New Direction

Xbox Vowed to Bet Big on The Elder Scrolls VI, Yet Bethesda Game Studios Was Gutted in the Xbox Layoffs

Ubisoft Chased Trends With Shadows And Lost, But Assassin’s Creed Black Flag Resynced is Quietly Outselling It 5.39x on Steam

SpaceX Asks The FCC For Permission To Launch 100,000 Gen3 Satellites That Sport Advanced Phased-Array Beamforming And Electronic Beam Steering Capabilities

Perplexity Bets on NVIDIA’s Vera CPU, Calling The Max Single-Threaded Chip a “Dead-On” Fit After It Ran 1.5x Faster in Agentic Coding

Popular Discussions

Intel’s Shot At Fabricating Apple’s A20 Chip For The Base iPhone 18 Collapses As A Credible Leaker Calls The Original Source A ‘Blowhard’

NVIDIA’s RTX 3060 12 GB Graphics Card Comeback Proves Just How Bad Things Are For The PC Gaming Market

AMD Ryzen Becomes The Top CPU Choice While Radeon Powers 1 In Every 3 Desktop Gaming GPUs Sold at Microcenter

Intel Expected To Restart Supply Of 10th, 12th, 13th, And 14th Gen Processors In Mainland China

Intel Cites Rising Supply Chain Costs As The Reason For Raising Prices Of Intel Core Ultra 200S Plus Processors

NVIDIA Updates Pascal GPU Board – Four HBM2 Stacks and Massive Die Previewed Ahead of Launch in 2016, 200 GB/s NVLINK Interconnect

NVIDIA NVLINK and Future of HPC Oriented GPUs

Outpacing PCI Express

A Flexible and Energy-Efficient Interconnect

NVIDIA Pascal GPU Prototype Board:

Contents

Further Reading

Trending Stories

Popular Discussions