NVIDIA’s 5th Gen Flagship Pascal GPU is 70% Faster Than Maxwell in CUDA Deep Neural Network Workloads


NVIDIA will unveil their Pascal GPU architecture aimed at HPC and AI market at GTC 2016 later today. Pascal is the codename for NVIDIA's 5th generation graphics architecture which delivers a range of technologies such as NVLINK, HBM2 and Mixed Precision. While we will get to hear more details at Jen-Hsun Huang's keynote in a few hours, some seminars at GTC 2016 have already revealed the performance improvement Pascal brings in AI (Artificial Intelligence) specific workloads.

Image Credits/Source: Hardware.Fr

NVIDIA's Pascal GPU Boost Performance By 70% in Deep Neural Network / AI Workloads

Spotted by Videocardz (via Hardware.fr), a slide discussing the NVIDIA cuDNN improvements was displayed in a session related to AI at GTC 2016. The slide displays the general performance improvement that NVIDIA brings with their updated cuDNN (CUDA Deep Neural Network) library. The new cuDNN v5 library comes with updates that include:

  • High Performance Deep Neural Network Training
  • Accelerates Deep Learning: Caffe, CNTK, Tensorflow, Theano, Torch
  • Performance continues to improve over time

The slide shows that Pascal with cuDNN v5 can deliver up to 12 times the performance increase in general. The Maxwell based Tesla M40 with cuDNN v3 and Kepler based Tesla K40 with cuDNN v1 delivers 6 times and 4 times (respective) performance increases. To sum it up for you, Pascal with the updated library delivers 70% better performance in AlexNet training throughput compared to the fastest single chip Maxwell Tesla solution available today.

Image Credits/Source: Computerbase and ServerTheHome

While this is a relatively big increase in performance, we can't evaluate the overall performance of the Pascal GPUs with just one metric and hope to learn more about Pascal GPUs in the main keynote today. Pascal is also confirmed to ship in a range of HPC optimized racks from SuperMicro and Quanta later this year.

Both companies have showcased their latest solutions based on Pascal GPU architecture and NVLINK. The new QuantaPlex T21W-3W is the first x86 server with NVIDIA Pascal NVLINK technology which means that NVIDIA is already ready to ship such solutions through their partners while SuperMicro will be shipping the flagship Pascal based 1U DP SYS-1028GQ-TR(T) rack this year following the announcement today.

The NVLINK Interconnect allows faster GPU To GPU access in servers!

The latest NVLINK interconnect path will allow multi-processors featured inside HPC blocks to have faster interconnect than traditional PCI-e Gen3 lanes up to 200 GB/s speeds. Pascal GPUs will also feature Unified memory support allowing the CPU and GPU to share the same memory pool and finally we have Mixed precision support. While NVLINK isn’t planned for commercial integration right now, it will be featured in servers using ARM64 chips and x86 powered HPC platforms that utilize OpenPower, Tyan and Quantum solutions. Expect to hear more on Pascal GPUs in a few hours.