Announcement Hardware PC

NVIDIA Tesla P100 Accelerator For PCI Express Based Platforms Announced – Comes in 16 GB and 12 GB HBM2 Variants, 250W TDP

Hassan Mujtaba • Jun 20, 2016 at 08:39am EDT

NVIDIA has just announced that they will be launching a PCI Express based version of their Tesla P100 GPU accelerator which is designed for hyper scale computing. The Tesla P100 which utilizes the GP100 GPU was initially announced back at GTC 2016 as the first graphics board to utilize HBM2 standard and NVLINK inter connect from NVIDIA. Today, NVIDIA is introducing two new products to their Tesla P100 family.

The NVIDIA Tesla P100 is the most advanced hyper scale graphics accelerator built to date.

NVIDIA Tesla P100 To Be Available in PCI-Express Form Factor - 12 GB and 16 GB HBM2 Variants Announced

Based on the GP100 GPU, the Tesla P100 is NVIDIA's most advanced and most powerful GPU ever designed for HPC and Datacenter platforms. These GPUs are designed to supercharge HPC applications by more than 30X compared to current generation solutions. The new PCI-Express solutions are designed for the datacenter and HPC market to make them compatible with current GPU accelerated servers as the previous Tesla P100 used a mezzanine connector which required the utilization of new servers. Both cards are optimized to power the most computationally intensive AI and HPC data center applications.

The NVIDIA Tesla P100 GPU is now available in PCI-Express form factor with multiple TFLOPs of dual precision.

NVIDIA Tesla P100 (GP100 GPU) Benchmarks

"Accelerated computing is the only path forward to keep up with researchers' insatiable demand for HPC and AI supercomputing," said Ian Buck, vice president of accelerated computing at NVIDIA. "Deploying CPU-only systems to meet this demand would require large numbers of commodity compute nodes, leading to substantially increased costs without proportional performance gains. Dramatically scaling performance with fewer, more powerful Tesla P100-powered nodes puts more dollars into computing instead of vast infrastructure overhead." via NVIDIA

NVIDIA Tesla P100 Specifications in detail - PCI-E and NVLINK Variants in Comparison

NVIDIA’s Tesla P100 is the most fastest supercomputing chip in the world. It is based on an entirely new, 5th Generation CUDA architecture codenamed Pascal. The GP100 GPU which utilizes the Pascal architecture is at the heart of the Tesla P100 accelerator. NVIDIA has spend the last several years in the development of the new GPU and it will finally be shipping in June 2016 to supercomputers.

The Tesla P100 comes with beefy specs. Starting off, we have a 16nm Pascal chip that measures in at 610mm2, features 15.3 Billion transistors and comes with 3584 CUDA cores. The full Pascal GP100 chip features up 3840 CUDA Cores. NVIDIA has redesigned their SMs (Streaming Multiprocessor) units and rearranged them to support 64 CUDA cores per SM block. The Tesla P100 has 56 of these blocks enabled while the full GP100 has 60 blocks in total. The chip comes with dedicated set of FP64 CUDA Cores. There are 32 FP64 cores per block and the whole GPU has 1792 dedicated FP64 cores.

The 16nm FinFET architecture allows maximum throughput of performance and clock rate. In the case of Tesla P100 solution that has been optimized for NVLINK capable servers, we are looking at 5.3 TFLOPs of double precision, 10.6 TFLOPs of single precision and 21.2 TFLOPs of half precision compute performance. The NVLINK variants come with 16 GB of HBM2 VRAM that delivers up to 720 GB/s bandwidth while NVLINK interconnect adds 60 GB/s bandwidth in addition to the 32 GB/s from the PCI-Express interconnect.

The PCI-Express optimized variants are optimized for lower clocks. These cards have TDP set to 250W so we are looking at slightly lower clock speeds than the NVLINK optimized variant. Both cards deliver 4.7 TFLOPs double, 9.3 TFLOPs single and 18.7 TFLOPs mixed precision compute performance. These 16 GB variant comes with total bandwidth of 720 GB/s while the 12 GB HBM2 variant comes with 540 GB/s bandwidth. The cards will use the PCI-Express interconnect (32 GB/s) for simultaneous connection between multiple GPUs.

The Tesla P100 has three variants, two PCI-Express optimized and a single NVLINK optimized.

"Tesla P100 accelerators deliver new levels of performance and efficiency to address some of the most important computational challenges of our time," said Thomas Schulthess, professor of computational physics at ETH Zurich and director of the Swiss National Supercomputing Center. "The upgrade of 4,500 GPU-accelerated nodes on Piz Daint to Tesla P100 GPUs will more than double the system's performance, enabling researchers to achieve breakthroughs in a range of fields, including cosmology, materials science, seismology and climatology." via NVIDIA

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla Graphics Card	Tesla K40 (PCI-Express)	Tesla M40 (PCI-Express)	Tesla P100 (PCI-Express)	Tesla P100 (SXM2)	Tesla V100 (PCI-Express)	Tesla V100 (SXM2)	Tesla V100S (PCIe)
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GP100 (Pascal)	GV100 (Volta)	GV100 (Volta)	GV100 (Volta)
Process Node	28nm	28nm	16nm	16nm	12nm	12nm	12nm
Transistors	7.1 Billion	8 Billion	15.3 Billion	15.3 Billion	21.1 Billion	21.1 Billion	21.1 Billion
GPU Die Size	551 mm2	601 mm2	610 mm2	610 mm2	815mm2	815mm2	815mm2
SMs	15	24	56	56	80	80	80
TPCs	15	24	28	28	40	40	40
CUDA Cores Per SM	192	128	64	64	64	64	64
CUDA Cores (Total)	2880	3072	3584	3584	5120	5120	5120
Texture Units	240	192	224	224	320	320	320
FP64 CUDA Cores / SM	64	4	32	32	32	32	32
FP64 CUDA Cores / GPU	960	96	1792	1792	2560	2560	2560
Base Clock	745 MHz	948 MHz	1190 MHz	1328 MHz	1230 MHz	1297 MHz	TBD
Boost Clock	875 MHz	1114 MHz	1329MHz	1480 MHz	1380 MHz	1530 MHz	1601 MHz
FP16 Compute	N/A	N/A	18.7 TFLOPs	21.2 TFLOPs	28.0 TFLOPs	30.4 TFLOPs	32.8 TFLOPs
FP32 Compute	5.04 TFLOPs	6.8 TFLOPs	10.0 TFLOPs	10.6 TFLOPs	14.0 TFLOPs	15.7 TFLOPs	16.4 TFLOPs
FP64 Compute	1.68 TFLOPs	0.2 TFLOPs	4.7 TFLOPs	5.30 TFLOPs	7.0 TFLOPs	7.80 TFLOPs	8.2 TFLOPs
Memory Interface	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM
Memory Size	12 GB GDDR5 @ 288 GB/s	24 GB GDDR5 @ 288 GB/s	16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s	16 GB HBM2 @ 732 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 1134 GB/s
L2 Cache Size	1536 KB	3072 KB	4096 KB	4096 KB	6144 KB	6144 KB	6144 KB
TDP	235W	250W	250W	300W	250W	300W	250W

NVIDIA Tesla P100 PCI-Express Features:

Unmatched application performance for mixed-HPC workloads -- Delivering 4.7 teraflops and 9.3 teraflops of double-precision and single-precision peak performance, respectively, a single Pascal-based Tesla P100 node provides the equivalent performance of more than 32 commodity CPU-only servers.
CoWoS with HBM2 for unprecedented efficiency -- The Tesla P100 unifies processor and data into a single package to deliver unprecedented compute efficiency. An innovative approach to memory design -- chip on wafer on substrate (CoWoS) with HBM2 -- provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the NVIDIA Maxwell™ architecture.
PageMigration Engine for simplified parallel programming -- Frees developers to focus on tuning for higher performance and less on managing data movement, and allows applications to scale beyond the GPU physical memory size with support for virtual memory paging. Unified memory technology dramatically improves productivity by enabling developers to see a single memory space for the entire node.
Unmatched application support -- With 410 GPU-accelerated applications, including nine of the top 10 HPC applications, the Tesla platform is the world's leading HPC computing platform.

Tesla P100 for PCIe Specifications:

4.7 teraflops double-precision performance, 9.3 teraflops single-precision performance and 18.7 teraflops half-precision performance with NVIDIA GPU BOOST™ technology
Support for PCIe Gen 3 interconnect (32GB/sec bi-directional bandwidth)
Enhanced programmability with Page Migration Engine and unified memory
ECC protection for increased reliability
Server-optimized for highest data center throughput and reliability
Available in two configurations:
- 16GB of CoWoS HBM2 stacked memory, delivering 720GB/sec of memory bandwidth
- 12GB of CoWoS HBM2 stacked memory, delivering 540GB/sec of memory bandwidth

16GB of CoWoS HBM2 stacked memory, delivering 720GB/sec of memory bandwidth
12GB of CoWoS HBM2 stacked memory, delivering 540GB/sec of memory bandwidth

NVIDIA's GP100 based Tesla P100 board is already shipping to the latest supercomputers that utilize NVLINK technology. The graphics board would also be available with NVIDIA's DGX-1 supercomputer rack later in June. The PCI-Express based products are expected to be available in Q4 2016 from NVIDIA partners and server makers including Cray, Dell, Hewlett Packard Enterprise, IBM and SGI. The NVLINK board will be available in Q1 2017 through NVIDIA partners.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Tesla P100 Accelerator For PCI Express Based Platforms Announced – Comes in 16 GB and 12 GB HBM2 Variants, 250W TDP

NVIDIA Tesla P100 Accelerator For PCI Express Based Platforms Announced – Comes in 16 GB and 12 GB HBM2 Variants, 250W TDP

NVIDIA Tesla P100 To Be Available in PCI-Express Form Factor - 12 GB and 16 GB HBM2 Variants Announced

NVIDIA Tesla P100 (GP100 GPU) Benchmarks

NVIDIA Tesla P100 Specifications in detail - PCI-E and NVLINK Variants in Comparison

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla P100 PCI-Express Features:

Tesla P100 for PCIe Specifications:

Trending Stories

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

GameStop May Have Leaked Zelda: Ocarina of Time Remake Pre-Orders for August 4, Hinting First Real Footage Isn’t Far

Square Enix’s Final Fantasy VII Rebirth Looks Like a Remaster on PC, as Shader Injector 2.0 Delivers Series’ Best Visuals

PlayStation 6 Patent Scraps Liquid Metal Cooling After PS5 Leaks Fried APUs And Motherboards For Years

MSI Silently Adds Four RTX 5060 GPUs With GB205 Die Instead Of GB206

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

NVIDIA Tesla P100 Accelerator For PCI Express Based Platforms Announced – Comes in 16 GB and 12 GB HBM2 Variants, 250W TDP

NVIDIA Tesla P100 To Be Available in PCI-Express Form Factor - 12 GB and 16 GB HBM2 Variants Announced

NVIDIA Tesla P100 Specifications in detail - PCI-E and NVLINK Variants in Comparison

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla P100 PCI-Express Features:

Tesla P100 for PCIe Specifications:

Further Reading

Trending Stories

Popular Discussions