⋮    ⋮  

NVIDIA’s Flagship Pascal GP100 GPU Die Shot Revealed at Hot Chips – The Biggest FinFET GPU To Date, Featuring HBM2 and NVLINK

Author Photo
Aug 23, 2016

NVIDIA has just released the first die shot of their biggest chip to date, the Pascal based GP100 GPU. Unveiled back at GTC 2016, the GP100 GPU is powering NVIDIA’s fastest hyperscale accelerator, the Tesla P100. The chip is the first to utilize HBM2 and NVLINK interfaces that deliver increased bandwidth and interconnect speeds.

NVIDIA GP100 GPU Die Shot Released at Hot Chips – The Biggest FinFET Product To Date Featuring HBM2

The GP100 is the big daddy of the Pascal GPU lineup. In fact, this is the only chip that hasn’t made its way to consumers yet. This is because it is solely dedicated to the HPC market. NVIDIA is shipping GP100 based Tesla P100 units to the data center market since June 2016. The GPU is specifically designed to handle tasks for HPC environments and comes with a range of features that aren’t available to consumers.

gtx1070_bodyrightclearphoto-0RelatedNVIDIA GeForce GTX 1070 Ti Full Specifications Leak Out – Rumored To Feature Locked Clocks Across AIB Models

Before we get into the technical details of Pascal GP100, let’s take a look at the fabulous shot of the die which was posted by Anandtech. The chip can be seen to be very dense in design. The die houses a total of 15.3 billion transistors. The NVLINK interface is situated on the right side of the die while four 1024-bit buses can be found on the top and bottom. The chip as a whole measures 610mm2, that’s without including the HBM2 memory on the same SOC.

The NVIDIA GP100 GPU Technical Details – A Recap of The Big Green Pascal Chip

Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The chip achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total).

The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units. These SM units have been arranged into 32 TPCs comprising of two SMs.

nvidia-middle-earth-shadow-of-war-game-bundleRelatedNVIDIA Announces Middle-Earth: Shadow of War Game Bundle Promo For GeForce GTX 1080 Ti and GeForce GTX 1080 Buyers

The block diagram of NVIDIA’s GP100 supercomputing chip.

Because of the importance of high-precision computation for technical computing and HPC codes, a key design goal for Tesla P100 is high double-precision performance. Each GP100 SM has 32 FP64 units, providing a 2:1 ratio of single- to double-precision throughput. Compared to the 3:1 ratio in Kepler GK110 GPUs, this allows Tesla P100 to process FP64 workloads more efficiently.

A close up shot of the NVIDIA Pascal chip along side the HBM2 package. (Image Credits: Anandtech)

The GPU also packs four stacks of HBM2 memory. The total VRAM featured on this chip is 16 GB which will be upgraded to 32 GB once HBM2 hits volume production in 2017. The chip features 720 GB/s bandwidth. For more information and details on GP100 GPU, you can read our article here. You can also find performance benchmarks of GP100 GPU here.

GPU ArchitectureNVIDIA FermiNVIDIA KeplerNVIDIA MaxwellNVIDIA Pascal
GPU Process40nm28nm28nm16nm (TSMC FinFET)
Flagship ChipGF110GK210GM200GP100
GPU Design SM (Streaming Multiprocessor)SMX (Streaming Multiprocessor)SMM (Streaming Multiprocessor Maxwell)SMP (Streaming Multiprocessor Pascal)
Maximum Transistors3.00 Billion7.08 Billion8.00 Billion15.3 Billion
Maximum Die Size520mm2561mm2601mm2610mm2
Stream Processors Per Compute Unit32 SPs192 SPs128 SPs64 SPs
Maximum CUDA Cores512 CCs (16 CUs)2880 CCs (15 CUs)3072 CCs (24 CUs)3840 CCs (60 CUs)
FP32 Compute1.33 TFLOPs(Tesla)5.10 TFLOPs (Tesla)6.10 TFLOPs (Tesla)~12 TFLOPs (Tesla)
FP64 Compute0.66 TFLOPs (Tesla)1.43 TFLOPs (Tesla)0.20 TFLOPs (Tesla)~6 TFLOPs(Tesla)
Maximum VRAM1.5 GB GDDR56 GB GDDR512 GB GDDR516 / 32 GB HBM2
Maximum Bandwidth192 GB/s336 GB/s336 GB/s720 GB/s - 1 TB/s
Maximum TDP244W250W250W300W
Launch Year2010 (GTX 580)2014 (GTX Titan Black)2015 (GTX Titan X)2016