NVIDIA Shipping Working GP100 Based Tesla P100 Boards in June – HPC / Supercomputing First, OEM Availability in Q1 2017

Author Photo
Apr 6, 2016

NVIDIA server partners have revealed the first actual Tesla P100 graphics cards that will be shipping this year. The Tesla P100 GPU was announced yesterday by Jen-Hsun Huang (CEO of NVIDIA) at the opening key note of GTC 2016. The Tesla P100 was already stated to be in volume production and will be shipping first to the deep learning / cloud computing servers and supercomputers in US in June 2016 and OEM availability in Q1 2017.

The Tesla P100 is the worlds fastest, hyperscale datacenter and HPC Graphics Card.

NVIDIA Already Has Working Tesla P100 Boards – 16nm Pascal in Full Volume Production

NVIDIA is clearly prepping the GP100 for their first hand customers at the moment which is the cloud and HPC market. Back in 2012, NVIDIA started shipping their high-end Kepler GK110 chips to supercomputers before it was available to either OEMs or consumers. Supercomputers and cloud servers are NVIDIA’s first priority which is further proven by the fact that the NVIDIA DGX-1 supercomputing platform will be shipping in June 2016 and other server partners will be able to sell it through their channels in Q1 2017 which is the date when NVIDIA will plan to ship GP100 to OEMs.

There’s no doubt that NVIDIA considers HPC market a priority of their latest GP100 GPU. We saw a two hour worth of talk by NVIDIA’s CEO on HPC / Deep Learning and AI acceleration yesterday. The announcements were insane when seen through a server computing perspective but that doesn’t mean NVIDIA is leaving behind the consumer market as a whole.

The NVIDIA DGX-1 is a complete supercomputing solution that houses several GP100 GPUs!

The majority of NVIDIA’s revenue still comes from gaming, it is the market NVIDIA is also known for and carters to the most. The gaming community should not be disappointed by the event as reports are that we could see consumer variants based on Pascal GPUs as early as Computex 2016.

Coming back to the topic, Computerbase managed to find the first batch of Tesla P100 GPUs which are shipping in the QuantaPlex T21W-3U rack from Quanta. We talked about this a day earlier and the specs of this unit are very high-end. We are looking at the latest Xeon E5-2600 V4 processors with up to a TB of DRAM clocked in at 2666 MHz, a hybrid storage solution that can support upto 12 SATA/SFF and 8 U.2 (x4) drives, a fully equipped NVLINK interconnect that uses Mezzanine slots to support up to 8 Tesla P100 GPUs and cooling is offered by a air cooled system.

NVIDIA’s GP100 GPU Is Real – Here’s What it Looks Like:

Image Credits/Source: Computerbase

You can note in the pictures above that the Tesla P100 shown in the units have two models, a week 41 unit with HBM1 and a week 43 unit with HBM2. This indicates that NVIDIA figured out the interposer design for the Pascal GPUs back in 2015. NVIDIA showed the same designs back in GTC 2015 held in Japan and waited for the supply of HBM2 memory which would later be equipped on their GP100 chip.

Although these units are not shown by NVIDIA themselves but by Quanta QT, it proves that NVIDIA had already been sampling GP100 GPUs with both HBM1 (2015) and HBM2 (2016) memory for quite some time. Quanta also showcased a live demo unit of their rack at GTC 2016 with working NVLINK. This is really interesting and with Samsung already mass producing HBM2 memory, Tesla P100 will be on its way to be housed in the next HPC platforms starting June 2016.

NVIDIA Pascal Tesla P100 To Power The Piz Daint Super Computer in 2016 – 7.8 PFLOPs Compute Performance

The NVIDIA Tesla P100 will be used to upgrade the Piz Daint supercomputer at the CSCS (Swiss National Super Computing Center). NVIDIA will be shipping over 4500 Pascal based GPUs later this year which will be used to deliver up to 7.8 PetaFLOPs of compute performance or 7.8 quadrillion mathematical calculations per second.

The Five Miracles That Ushered In The Development of the Pascal GP100 GPU.

Pascal is the most advanced GPU architecture ever built, delivering unmatched performance and efficiency to power the most computationally demanding applications. Pascal-based Tesla GPUs will allow researchers to solve larger, more complex problems that are currently out of reach in cosmology, materials science, seismology, climatology and a host of other fields. via NVIDIA

“CSCS scientists are using Piz Daint to tackle some of the most important computational challenges of our day, like modeling the human brain and uncovering new insights into the origins of the universe,” said Ian Buck, vice president of Accelerated Computing at NVIDIA. “Tesla GPUs deliver a massive leap in application performance, allowing CSCS to push the limits of scientific discovery.”

Tesla P100 With GP100 – The Fastest Supercomputing Chip in The World

NVIDIA’s Tesla P100 is the most fastest supercomputing chip in the world. It is based on an entirely new, 5th Generation CUDA architecture codenamed Pascal. The GP100 GPU which utilizes the Pascal architecture is at the heart of the Tesla P100 accelerator. NVIDIA has spend the last several years in the development of the new GPU and it will finally be shipping in June 2016 to supercomputers.

The Tesla P100 comes with beefy specs. Starting off, we have a 16nm Pascal chip that measures in at 610mm2, features 15.3 Billion transistors and comes with 3584 CUDA cores. The full Pascal GP100 chip features up 3840 CUDA Cores. NVIDIA has redesigned their SMs (Streaming Multiprocessor) units and rearranged them to support 64 CUDA cores per SM block. The Tesla P100 has 56 of these blocks enabled while the full GP100 has 60 blocks in total. The chip comes with dedicated set of FP64 CUDA Cores. There are 32 FP64 cores per block and the whole GPU has 1792 dedicated FP64 cores.

The 16nm FinFET architecture allows maximum throughput of performance and clock rate. In the case of Tesla P100, we are looking at 1328 MHz base and 1480 MHz boost clock which allow NVIDIA to crunch up to 21.2 TFLOPs FP16, 10.6 TFLOPs FP32 and 5.3 TFLOPs FP64 compute performance. The Tesla P100 features the next generation HBM2 memory with 16 GB VRAM and comes with 300W TDP. More details on the architecture can be found here.

NVIDIA Volta Tesla V100 Specs:

NVIDIA Tesla Graphics Card Tesla K40
Tesla M40
Tesla P100
Tesla P100
Tesla P100 (SXM2) Tesla V100 (PCI-Express) Tesla V100 (SXM2)
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal) GP100 (Pascal) GV100 (Volta) GV100 (Volta)
Process Node 28nm 28nm 16nm 16nm 16nm 12nm 12nm
Transistors 7.1 Billion 8 Billion 15.3 Billion 15.3 Billion 15.3 Billion 21.1 Billion 21.1 Billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610 mm2 610 mm2 815mm2 815mm2
SMs 15 24 56 56 56 80 80
TPCs 15 24 28 28 28 40 40
CUDA Cores Per SM 192 128 64 64 64 64 64
CUDA Cores (Total) 2880 3072 3584 3584 3584 5120 5120
FP64 CUDA Cores / SM 64 4 32 32 32 32 32
FP64 CUDA Cores / GPU 960 96 1792 1792 1792 2560 2560
Base Clock 745 MHz 948 MHz TBD TBD 1328 MHz TBD 1370 MHz
Boost Clock 875 MHz 1114 MHz 1300MHz 1300MHz 1480 MHz 1370 MHz 1455 MHz
FP16 Compute N/A N/A 18.7 TFLOPs 18.7 TFLOPs 21.2 TFLOPs 28.0 TFLOPs 30.0 TFLOPs
FP32 Compute 5.04 TFLOPs 6.8 TFLOPs 10.0 TFLOPs 10.0 TFLOPs 10.6 TFLOPs 14.0 TFLOPs 15.0 TFLOPs
FP64 Compute 1.68 TFLOPs 0.2 TFLOPs 4.7 TFLOPs 4.7 TFLOPs 5.30 TFLOPs 7.0 TFLOPs 7.50 TFLOPs
Texture Units 240 192 224 224 224 320 320
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2
Memory Size 12 GB GDDR5 @ 288 GB/s 24 GB GDDR5 @ 288 GB/s 12 GB HBM2 @ 549 GB/s 16 GB HBM2 @ 732 GB/s 16 GB HBM2 @ 732 GB/s 16 GB HBM2 @ 900 GB/s 16 GB HBM2 @ 900 GB/s
L2 Cache Size 1536 KB 3072 KB 4096 KB 4096 KB 4096 KB 6144 KB 6144 KB
TDP 235W 250W 250W 250W 300W 250W 300W