NVIDIA’s Flagship Pascal GP100 GPU Die Shot Revealed at Hot Chips – The Biggest FinFET GPU To Date, Featuring HBM2 and NVLINK
NVIDIA has just released the first die shot of their biggest chip to date, the Pascal based GP100 GPU. Unveiled back at GTC 2016, the GP100 GPU is powering NVIDIA’s fastest hyperscale accelerator, the Tesla P100. The chip is the first to utilize HBM2 and NVLINK interfaces that deliver increased bandwidth and interconnect speeds.
NVIDIA GP100 GPU Die Shot Released at Hot Chips – The Biggest FinFET Product To Date Featuring HBM2
The GP100 is the big daddy of the Pascal GPU lineup. In fact, this is the only chip that hasn’t made its way to consumers yet. This is because it is solely dedicated to the HPC market. NVIDIA is shipping GP100 based Tesla P100 units to the data center market since June 2016. The GPU is specifically designed to handle tasks for HPC environments and comes with a range of features that aren’t available to consumers.
Before we get into the technical details of Pascal GP100, let’s take a look at the fabulous shot of the die which was posted by Anandtech. The chip can be seen to be very dense in design. The die houses a total of 15.3 billion transistors. The NVLINK interface is situated on the right side of the die while four 1024-bit buses can be found on the top and bottom. The chip as a whole measures 610mm2, that’s without including the HBM2 memory on the same SOC.
The NVIDIA GP100 GPU Technical Details – A Recap of The Big Green Pascal Chip
Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The chip achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total).
The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units. These SM units have been arranged into 32 TPCs comprising of two SMs.
Because of the importance of high-precision computation for technical computing and HPC codes, a key design goal for Tesla P100 is high double-precision performance. Each GP100 SM has 32 FP64 units, providing a 2:1 ratio of single- to double-precision throughput. Compared to the 3:1 ratio in Kepler GK110 GPUs, this allows Tesla P100 to process FP64 workloads more efficiently.
The GPU also packs four stacks of HBM2 memory. The total VRAM featured on this chip is 16 GB which will be upgraded to 32 GB once HBM2 hits volume production in 2017. The chip features 720 GB/s bandwidth. For more information and details on GP100 GPU, you can read our article here. You can also find performance benchmarks of GP100 GPU here.
|GPU Architecture||NVIDIA Fermi||NVIDIA Kepler||NVIDIA Maxwell||NVIDIA Pascal|
|GPU Process||40nm||28nm||28nm||16nm (TSMC FinFET)|
|GPU Design||SM (Streaming Multiprocessor)||SMX (Streaming Multiprocessor)||SMM (Streaming Multiprocessor Maxwell)||SMP (Streaming Multiprocessor Pascal)|
|Maximum Transistors||3.00 Billion||7.08 Billion||8.00 Billion||15.3 Billion|
|Maximum Die Size||520mm2||561mm2||601mm2||610mm2|
|Stream Processors Per Compute Unit||32 SPs||192 SPs||128 SPs||64 SPs|
|Maximum CUDA Cores||512 CCs (16 CUs)||2880 CCs (15 CUs)||3072 CCs (24 CUs)||3840 CCs (60 CUs)|
|FP32 Compute||1.33 TFLOPs(Tesla)||5.10 TFLOPs (Tesla)||6.10 TFLOPs (Tesla)||~12 TFLOPs (Tesla)|
|FP64 Compute||0.66 TFLOPs (Tesla)||1.43 TFLOPs (Tesla)||0.20 TFLOPs (Tesla)||5.5 TFLOPs(Tesla)|
|Maximum VRAM||1.5 GB GDDR5||6 GB GDDR5||12 GB GDDR5||16 / 32 GB HBM2|
|Maximum Bandwidth||192 GB/s||336 GB/s||336 GB/s||1 TB/s|
|Launch Year||2010 (GTX 580)||2014 (GTX Titan Black)||2015 (GTX Titan X)||2016|