NVIDIA Shipping Working GP100 Based Tesla P100 Boards in June – HPC / Supercomputing First, OEM Availability in Q1 2017
NVIDIA server partners have revealed the first actual Tesla P100 graphics cards that will be shipping this year. The Tesla P100 GPU was announced yesterday by Jen-Hsun Huang (CEO of NVIDIA) at the opening key note of GTC 2016. The Tesla P100 was already stated to be in volume production and will be shipping first to the deep learning / cloud computing servers and supercomputers in US in June 2016 and OEM availability in Q1 2017.
The Tesla P100 is the worlds fastest, hyperscale datacenter and HPC Graphics Card.
NVIDIA Already Has Working Tesla P100 Boards - 16nm Pascal in Full Volume Production
NVIDIA is clearly prepping the GP100 for their first hand customers at the moment which is the cloud and HPC market. Back in 2012, NVIDIA started shipping their high-end Kepler GK110 chips to supercomputers before it was available to either OEMs or consumers. Supercomputers and cloud servers are NVIDIA's first priority which is further proven by the fact that the NVIDIA DGX-1 supercomputing platform will be shipping in June 2016 and other server partners will be able to sell it through their channels in Q1 2017 which is the date when NVIDIA will plan to ship GP100 to OEMs.
There's no doubt that NVIDIA considers HPC market a priority of their latest GP100 GPU. We saw a two hour worth of talk by NVIDIA's CEO on HPC / Deep Learning and AI acceleration yesterday. The announcements were insane when seen through a server computing perspective but that doesn't mean NVIDIA is leaving behind the consumer market as a whole.
The NVIDIA DGX-1 is a complete supercomputing solution that houses several GP100 GPUs!
The majority of NVIDIA's revenue still comes from gaming, it is the market NVIDIA is also known for and carters to the most. The gaming community should not be disappointed by the event as reports are that we could see consumer variants based on Pascal GPUs as early as Computex 2016.
Coming back to the topic, Computerbase managed to find the first batch of Tesla P100 GPUs which are shipping in the QuantaPlex T21W-3U rack from Quanta. We talked about this a day earlier and the specs of this unit are very high-end. We are looking at the latest Xeon E5-2600 V4 processors with up to a TB of DRAM clocked in at 2666 MHz, a hybrid storage solution that can support upto 12 SATA/SFF and 8 U.2 (x4) drives, a fully equipped NVLINK interconnect that uses Mezzanine slots to support up to 8 Tesla P100 GPUs and cooling is offered by a air cooled system.
NVIDIA's GP100 GPU Is Real - Here's What it Looks Like:
Image Credits/Source: Computerbase
You can note in the pictures above that the Tesla P100 shown in the units have two models, a week 41 unit with HBM1 and a week 43 unit with HBM2. This indicates that NVIDIA figured out the interposer design for the Pascal GPUs back in 2015. NVIDIA showed the same designs back in GTC 2015 held in Japan and waited for the supply of HBM2 memory which would later be equipped on their GP100 chip.
Although these units are not shown by NVIDIA themselves but by Quanta QT, it proves that NVIDIA had already been sampling GP100 GPUs with both HBM1 (2015) and HBM2 (2016) memory for quite some time. Quanta also showcased a live demo unit of their rack at GTC 2016 with working NVLINK. This is really interesting and with Samsung already mass producing HBM2 memory, Tesla P100 will be on its way to be housed in the next HPC platforms starting June 2016.
NVIDIA Pascal Tesla P100 To Power The Piz Daint Super Computer in 2016 - 7.8 PFLOPs Compute Performance
The NVIDIA Tesla P100 will be used to upgrade the Piz Daint supercomputer at the CSCS (Swiss National Super Computing Center). NVIDIA will be shipping over 4500 Pascal based GPUs later this year which will be used to deliver up to 7.8 PetaFLOPs of compute performance or 7.8 quadrillion mathematical calculations per second.
The Five Miracles That Ushered In The Development of the Pascal GP100 GPU.
Pascal is the most advanced GPU architecture ever built, delivering unmatched performance and efficiency to power the most computationally demanding applications. Pascal-based Tesla GPUs will allow researchers to solve larger, more complex problems that are currently out of reach in cosmology, materials science, seismology, climatology and a host of other fields. via NVIDIA
"CSCS scientists are using Piz Daint to tackle some of the most important computational challenges of our day, like modeling the human brain and uncovering new insights into the origins of the universe," said Ian Buck, vice president of Accelerated Computing at NVIDIA. "Tesla GPUs deliver a massive leap in application performance, allowing CSCS to push the limits of scientific discovery."
Tesla P100 With GP100 - The Fastest Supercomputing Chip in The World
NVIDIA's Tesla P100 is the most fastest supercomputing chip in the world. It is based on an entirely new, 5th Generation CUDA architecture codenamed Pascal. The GP100 GPU which utilizes the Pascal architecture is at the heart of the Tesla P100 accelerator. NVIDIA has spend the last several years in the development of the new GPU and it will finally be shipping in June 2016 to supercomputers.
The Tesla P100 comes with beefy specs. Starting off, we have a 16nm Pascal chip that measures in at 610mm2, features 15.3 Billion transistors and comes with 3584 CUDA cores. The full Pascal GP100 chip features up 3840 CUDA Cores. NVIDIA has redesigned their SMs (Streaming Multiprocessor) units and rearranged them to support 64 CUDA cores per SM block. The Tesla P100 has 56 of these blocks enabled while the full GP100 has 60 blocks in total. The chip comes with dedicated set of FP64 CUDA Cores. There are 32 FP64 cores per block and the whole GPU has 1792 dedicated FP64 cores.
The 16nm FinFET architecture allows maximum throughput of performance and clock rate. In the case of Tesla P100, we are looking at 1328 MHz base and 1480 MHz boost clock which allow NVIDIA to crunch up to 21.2 TFLOPs FP16, 10.6 TFLOPs FP32 and 5.3 TFLOPs FP64 compute performance. The Tesla P100 features the next generation HBM2 memory with 16 GB VRAM and comes with 300W TDP. More details on the architecture can be found here.
NVIDIA Volta Tesla V100S Specs:
|NVIDIA Tesla Graphics Card||Tesla K40|
|Tesla P100 (SXM2)||Tesla V100 (PCI-Express)||Tesla V100 (SXM2)||Tesla V100S (PCIe)|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)||GV100 (Volta)||GV100 (Volta)||GV100 (Volta)|
|Transistors||7.1 Billion||8 Billion||15.3 Billion||15.3 Billion||21.1 Billion||21.1 Billion||21.1 Billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610 mm2||815mm2||815mm2||815mm2|
|CUDA Cores Per SM||192||128||64||64||64||64||64|
|CUDA Cores (Total)||2880||3072||3584||3584||5120||5120||5120|
|FP64 CUDA Cores / SM||64||4||32||32||32||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1792||2560||2560||2560|
|Base Clock||745 MHz||948 MHz||1190 MHz||1328 MHz||1230 MHz||1297 MHz||TBD|
|Boost Clock||875 MHz||1114 MHz||1329MHz||1480 MHz||1380 MHz||1530 MHz||1601 MHz|
|FP16 Compute||N/A||N/A||18.7 TFLOPs||21.2 TFLOPs||28.0 TFLOPs||30.4 TFLOPs||32.8 TFLOPs|
|FP32 Compute||5.04 TFLOPs||6.8 TFLOPs||10.0 TFLOPs||10.6 TFLOPs||14.0 TFLOPs||15.7 TFLOPs||16.4 TFLOPs|
|FP64 Compute||1.68 TFLOPs||0.2 TFLOPs||4.7 TFLOPs||5.30 TFLOPs||7.0 TFLOPs||7.80 TFLOPs||8.2 TFLOPs|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2||4096-bit HBM2||4096-bit HBM2||4096-bit HBM|
|Memory Size||12 GB GDDR5 @ 288 GB/s||24 GB GDDR5 @ 288 GB/s||16 GB HBM2 @ 732 GB/s|
12 GB HBM2 @ 549 GB/s
|16 GB HBM2 @ 732 GB/s||16 GB HBM2 @ 900 GB/s||16 GB HBM2 @ 900 GB/s||16 GB HBM2 @ 1134 GB/s|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||4096 KB||6144 KB||6144 KB||6144 KB|