NVIDIA DGX-1 Pascal Based Supercomputer Announced – Features 8 Tesla P100 GPUs With 170 TFLOPs Compute

Author Photo
Apr 5, 2016

At GTC 2016, NVIDIA announced their behemoth DGX-1 supercomputer which features up to 170 TFLOPs of compute performance. The DGX-1 is an all-in-one supercomputing solution that houses several Tesla P100 graphics boards that were launched today. Based on the Pascal GPU architecture, the Tesla P100 delivers an insane boost in compute performance and allows high-performance supercomputing for deep learning.

NVIDIA DGX-1 Is A 16nm Pascal Based Super computing Solution With 170 TeraFlops of Compute Performance

The NVIDIA DGX-1 is a complete supercomputing solution that houses NVIDIA’s latest hardware and software innovations ranging from Pascal and NVIDIA SDK suite. The DGX-1 has the performance throughput equivalent to 250 x86 servers. This insane amount of performance allows users to get their own supercomputer for HPC and AI specific workloads.

“Artificial intelligence is the most far-reaching technological advancement in our lifetime,” said Jen-Hsun Huang, CEO and co-founder of NVIDIA. “It changes every industry, every company, everything. It will open up markets to benefit everyone. Data scientists and AI researchers today spend far too much time on home-brewed high performance computing solutions. The DGX-1 is easy to deploy and was created for one purpose: to unlock the powers of superhuman capabilities and apply them to problems that were once unsolvable.” via NVIDIA

Some of the key specifications of NVIDIA’s DGX-1 Supercomputer include:

  • Up to 170 teraflops of half-precision (FP16) peak performance
  • Eight Tesla P100 GPU accelerators, 16GB memory per GPU
  • NVLink Hybrid Cube Mesh
  • 7TB SSD DL Cache
  • Dual 10GbE, Quad InfiniBand 100Gb networking
  • 3U – 3200W

Supercomputing Performance Comes at a Super Insane Price – $129,000 US For NVIDIA’s DGX-1

It’s obvious that the NVIDIA DGX-1 isn’t built for a specific user but will be aimed at big organizations such as universities and institutes involved in research. Just like Pascal which will be shipping in 2016 and available in June in the US and Q3 for the rest of the world, the DGX-1 orders commence from today but will be available at a later date. Probably by the end of this year. The NVIDIA DGX-1 comes with 8 Pascal based Tesla P100 graphics boards, Dual Intel Xeon processors and 7 TBs of SSD storage. The whole platform achieves an aggregate bandwidth of 768 GB/s.

Comprehensive Deep Learning Software Suite

The NVIDIA DGX-1 system includes a complete suite of optimized deep learning software that allows researchers and data scientists to quickly and easily train deep neural networks.

The DGX-1 software includes the NVIDIA Deep Learning GPU Training System (DIGITS), a complete, interactive system for designing deep neural networks (DNNs). It also includes the newly released NVIDIA CUDA Deep Neural Network library (cuDNN) version 5, a GPU-accelerated library of primitives for designing DNNs.
It also includes optimized versions of several widely used deep learning frameworks — Caffe, Theano and Torch. The DGX-1 additionally provides access to cloud management tools, software updates and a repository for containerized applications.

The Tesla P100 Housed Inside the DGX-100 Is a Monster Graphics Card

The Tesla P100 is the heart of the DGX-100 platform. Featuring the latest 5th generation Pascal architecture with 3584 CUDA Cores, 240 texture mapping units, clock speeds up to 1480 MHz and 16 GB of HBM2 VRAM (720 GB/s stream bandwidth), the DGX-1 is all prepped for the most intensive workloads pitted against it. We have already covered an extensive deal of architecture details in our article here so you definitely want to give that a read. The NVIDIA presentation was definitely enjoyable for the folks who love high performance computing products but we are very sure that NVIDIA will have products for consumers heading it in a very short time.

NVIDIA Volta Tesla V100 Specs:

NVIDIA Tesla Graphics Card Tesla K40
Tesla M40
Tesla P100
Tesla P100
Tesla P100 (SXM2) Tesla V100 (PCI-Express) Tesla V100 (SXM2)
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal) GP100 (Pascal) GV100 (Volta) GV100 (Volta)
Process Node 28nm 28nm 16nm 16nm 16nm 12nm 12nm
Transistors 7.1 Billion 8 Billion 15.3 Billion 15.3 Billion 15.3 Billion 21.1 Billion 21.1 Billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610 mm2 610 mm2 815mm2 815mm2
SMs 15 24 56 56 56 80 80
TPCs 15 24 28 28 28 40 40
CUDA Cores Per SM 192 128 64 64 64 64 64
CUDA Cores (Total) 2880 3072 3584 3584 3584 5120 5120
FP64 CUDA Cores / SM 64 4 32 32 32 32 32
FP64 CUDA Cores / GPU 960 96 1792 1792 1792 2560 2560
Base Clock 745 MHz 948 MHz TBD TBD 1328 MHz TBD 1370 MHz
Boost Clock 875 MHz 1114 MHz 1300MHz 1300MHz 1480 MHz 1370 MHz 1455 MHz
FP16 Compute N/A N/A 18.7 TFLOPs 18.7 TFLOPs 21.2 TFLOPs 28.0 TFLOPs 30.0 TFLOPs
FP32 Compute 5.04 TFLOPs 6.8 TFLOPs 10.0 TFLOPs 10.0 TFLOPs 10.6 TFLOPs 14.0 TFLOPs 15.0 TFLOPs
FP64 Compute 1.68 TFLOPs 0.2 TFLOPs 4.7 TFLOPs 4.7 TFLOPs 5.30 TFLOPs 7.0 TFLOPs 7.50 TFLOPs
Texture Units 240 192 224 224 224 320 320
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2
Memory Size 12 GB GDDR5 @ 288 GB/s 24 GB GDDR5 @ 288 GB/s 12 GB HBM2 @ 549 GB/s 16 GB HBM2 @ 732 GB/s 16 GB HBM2 @ 732 GB/s 16 GB HBM2 @ 900 GB/s 16 GB HBM2 @ 900 GB/s
L2 Cache Size 1536 KB 3072 KB 4096 KB 4096 KB 4096 KB 6144 KB 6144 KB
TDP 235W 250W 250W 250W 300W 250W 300W