NVIDIA Tesla P100 Accelerator For PCI Express Based Platforms Announced – Comes in 16 GB and 12 GB HBM2 Variants, 250W TDP

Author Photo
Jun 20, 2016
Share Tweet Submit

NVIDIA has just announced that they will be launching a PCI Express based version of their Tesla P100 GPU accelerator which is designed for hyper scale computing. The Tesla P100 which utilizes the GP100 GPU was initially announced back at GTC 2016 as the first graphics board to utilize HBM2 standard and NVLINK inter connect from NVIDIA. Today, NVIDIA is introducing two new products to their Tesla P100 family.

The NVIDIA Tesla P100 is the most advanced hyper scale graphics accelerator built to date.

NVIDIA Tesla P100 To Be Available in PCI-Express Form Factor – 12 GB and 16 GB HBM2 Variants Announced

Based on the GP100 GPU, the Tesla P100 is NVIDIA’s most advanced and most powerful GPU ever designed for HPC and Datacenter platforms. These GPUs are designed to supercharge HPC applications by more than 30X compared to current generation solutions. The new PCI-Express solutions are designed for the datacenter and HPC market to make them compatible with current GPU accelerated servers as the previous Tesla P100 used a mezzanine connector which required the utilization of new servers. Both cards are optimized to power the most computationally intensive AI and HPC data center applications.

The NVIDIA Tesla P100 GPU is now available in PCI-Express form factor with multiple TFLOPs of dual precision.

NVIDIA Tesla P100 (GP100 GPU) Benchmarks

“Accelerated computing is the only path forward to keep up with researchers’ insatiable demand for HPC and AI supercomputing,” said Ian Buck, vice president of accelerated computing at NVIDIA. “Deploying CPU-only systems to meet this demand would require large numbers of commodity compute nodes, leading to substantially increased costs without proportional performance gains. Dramatically scaling performance with fewer, more powerful Tesla P100-powered nodes puts more dollars into computing instead of vast infrastructure overhead.” via NVIDIA

NVIDIA Tesla P100 Specifications in detail – PCI-E and NVLINK Variants in Comparison

NVIDIA’s Tesla P100 is the most fastest supercomputing chip in the world. It is based on an entirely new, 5th Generation CUDA architecture codenamed Pascal. The GP100 GPU which utilizes the Pascal architecture is at the heart of the Tesla P100 accelerator. NVIDIA has spend the last several years in the development of the new GPU and it will finally be shipping in June 2016 to supercomputers.

The Tesla P100 comes with beefy specs. Starting off, we have a 16nm Pascal chip that measures in at 610mm2, features 15.3 Billion transistors and comes with 3584 CUDA cores. The full Pascal GP100 chip features up 3840 CUDA Cores. NVIDIA has redesigned their SMs (Streaming Multiprocessor) units and rearranged them to support 64 CUDA cores per SM block. The Tesla P100 has 56 of these blocks enabled while the full GP100 has 60 blocks in total. The chip comes with dedicated set of FP64 CUDA Cores. There are 32 FP64 cores per block and the whole GPU has 1792 dedicated FP64 cores.

The 16nm FinFET architecture allows maximum throughput of performance and clock rate. In the case of Tesla P100 solution that has been optimized for NVLINK capable servers, we are looking at 5.3 TFLOPs of double precision, 10.6 TFLOPs of single precision and 21.2 TFLOPs of half precision compute performance. The NVLINK variants come with 16 GB of HBM2 VRAM that delivers up to 720 GB/s bandwidth while NVLINK interconnect adds 60 GB/s bandwidth in addition to the 32 GB/s from the PCI-Express interconnect.

GIGABYTE Unleashes The GTX 1080 XTREME Gaming Waterforce WB - A Powerful and Beautiful Card Aimed at Liquid Cooling Enthusiasts

The PCI-Express optimized variants are optimized for lower clocks. These cards have TDP set to 250W so we are looking at slightly lower clock speeds than the NVLINK optimized variant. Both cards deliver 4.7 TFLOPs double, 9.3 TFLOPs single and 18.7 TFLOPs mixed precision compute performance. These 16 GB variant comes with total bandwidth of 720 GB/s while the 12 GB HBM2 variant comes with 540 GB/s bandwidth. The cards will use the PCI-Express interconnect (32 GB/s) for simultaneous connection between multiple GPUs.

The Tesla P100 has three variants, two PCI-Express optimized and a single NVLINK optimized.

“Tesla P100 accelerators deliver new levels of performance and efficiency to address some of the most important computational challenges of our time,” said Thomas Schulthess, professor of computational physics at ETH Zurich and director of the Swiss National Supercomputing Center. “The upgrade of 4,500 GPU-accelerated nodes on Piz Daint to Tesla P100 GPUs will more than double the system’s performance, enabling researchers to achieve breakthroughs in a range of fields, including cosmology, materials science, seismology and climatology.” via NVIDIA

NVIDIA Pascal Tesla P100 Specs:

NVIDIA Tesla Graphics CardTesla K40
Tesla M40
Tesla P100
Tesla P100
Tesla P100 (Mezzanine)
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)GP100 (Pascal)GP100 (Pascal)
Process Node28nm28nm16nm16nm16nm
Transistors7.1 Billion8 Billion15.3 Billion15.3 Billion15.3 Billion
GPU Die Size551 mm2601 mm2610 mm2610 mm2610 mm2
CUDA Cores Per SM192128646464
CUDA Cores (Total)28803072358435843584
FP64 CUDA Cores / SM644323232
FP64 CUDA Cores / GPU96096179217921792
Base Clock745 MHz948 MHzTBDTBD1328 MHz
Boost Clock875 MHz1114 MHz1300MHz1300MHz1480 MHz
FP64 Compute1.68 TFLOPs0.2 TFLOPs4.7 TFLOPs4.7 TFLOPs5.30 TFLOPs
Texture Units240192224224224
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM24096-bit HBM2
Memory Size12 GB GDDR524 GB GDDR512 GB HBM216 GB HBM216 GB HBM2
L2 Cache Size1536 KB3072 KB4096 KB4096 KB4096 KB
NVIDIA Pascal Breaks GPU Frequency Record - GP106 Overclocked to 2885 MHz on LN2 at GALAX GOC Event

NVIDIA Tesla P100 PCI-Express Features:

  • Unmatched application performance for mixed-HPC workloads — Delivering 4.7 teraflops and 9.3 teraflops of double-precision and single-precision peak performance, respectively, a single Pascal-based Tesla P100 node provides the equivalent performance of more than 32 commodity CPU-only servers.
  • CoWoS with HBM2 for unprecedented efficiency — The Tesla P100 unifies processor and data into a single package to deliver unprecedented compute efficiency. An innovative approach to memory design — chip on wafer on substrate (CoWoS) with HBM2 — provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the NVIDIA Maxwell™ architecture.
  • PageMigration Engine for simplified parallel programming — Frees developers to focus on tuning for higher performance and less on managing data movement, and allows applications to scale beyond the GPU physical memory size with support for virtual memory paging. Unified memory technology dramatically improves productivity by enabling developers to see a single memory space for the entire node.
  • Unmatched application support — With 410 GPU-accelerated applications, including nine of the top 10 HPC applications, the Tesla platform is the world’s leading HPC computing platform.

Tesla P100 for PCIe Specifications:

  • 4.7 teraflops double-precision performance, 9.3 teraflops single-precision performance and 18.7 teraflops half-precision performance with NVIDIA GPU BOOST™ technology
  • Support for PCIe Gen 3 interconnect (32GB/sec bi-directional bandwidth)
  • Enhanced programmability with Page Migration Engine and unified memory
  • ECC protection for increased reliability
  • Server-optimized for highest data center throughput and reliability
  • Available in two configurations:
    • 16GB of CoWoS HBM2 stacked memory, delivering 720GB/sec of memory bandwidth
    • 12GB of CoWoS HBM2 stacked memory, delivering 540GB/sec of memory bandwidth
  • 16GB of CoWoS HBM2 stacked memory, delivering 720GB/sec of memory bandwidth
  • 12GB of CoWoS HBM2 stacked memory, delivering 540GB/sec of memory bandwidth

NVIDIA’s GP100 based Tesla P100 board is already shipping to the latest supercomputers that utilize NVLINK technology. The graphics board would also be available with NVIDIA’s DGX-1 supercomputer rack later in June. The PCI-Express based products are expected to be available in Q4 2016 from NVIDIA partners and server makers including Cray, Dell, Hewlett Packard Enterprise, IBM and SGI. The NVLINK board will be available in Q1 2017 through NVIDIA partners.

Share Tweet Submit