At GTC 2016, NVIDIA announced their behemoth DGX-1 supercomputer which features up to 170 TFLOPs of compute performance. The DGX-1 is an all-in-one supercomputing solution that houses several Tesla P100 graphics boards that were launched today. Based on the Pascal GPU architecture, the Tesla P100 delivers an insane boost in compute performance and allows high-performance supercomputing for deep learning.
NVIDIA DGX-1 Is A 16nm Pascal Based Super computing Solution With 170 TeraFlops of Compute Performance
The NVIDIA DGX-1 is a complete supercomputing solution that houses NVIDIA's latest hardware and software innovations ranging from Pascal and NVIDIA SDK suite. The DGX-1 has the performance throughput equivalent to 250 x86 servers. This insane amount of performance allows users to get their own supercomputer for HPC and AI specific workloads.
"Artificial intelligence is the most far-reaching technological advancement in our lifetime," said Jen-Hsun Huang, CEO and co-founder of NVIDIA. "It changes every industry, every company, everything. It will open up markets to benefit everyone. Data scientists and AI researchers today spend far too much time on home-brewed high performance computing solutions. The DGX-1 is easy to deploy and was created for one purpose: to unlock the powers of superhuman capabilities and apply them to problems that were once unsolvable." via NVIDIA
Some of the key specifications of NVIDIA's DGX-1 Supercomputer include:
- Up to 170 teraflops of half-precision (FP16) peak performance
- Eight Tesla P100 GPU accelerators, 16GB memory per GPU
- NVLink Hybrid Cube Mesh
- 7TB SSD DL Cache
- Dual 10GbE, Quad InfiniBand 100Gb networking
- 3U - 3200W
Supercomputing Performance Comes at a Super Insane Price - $129,000 US For NVIDIA's DGX-1
It's obvious that the NVIDIA DGX-1 isn't built for a specific user but will be aimed at big organizations such as universities and institutes involved in research. Just like Pascal which will be shipping in 2016 and available in June in the US and Q3 for the rest of the world, the DGX-1 orders commence from today but will be available at a later date. Probably by the end of this year. The NVIDIA DGX-1 comes with 8 Pascal based Tesla P100 graphics boards, Dual Intel Xeon processors and 7 TBs of SSD storage. The whole platform achieves an aggregate bandwidth of 768 GB/s.
Comprehensive Deep Learning Software Suite
The NVIDIA DGX-1 system includes a complete suite of optimized deep learning software that allows researchers and data scientists to quickly and easily train deep neural networks.
The DGX-1 software includes the NVIDIA Deep Learning GPU Training System (DIGITS), a complete, interactive system for designing deep neural networks (DNNs). It also includes the newly released NVIDIA CUDA Deep Neural Network library (cuDNN) version 5, a GPU-accelerated library of primitives for designing DNNs.
It also includes optimized versions of several widely used deep learning frameworks — Caffe, Theano and Torch. The DGX-1 additionally provides access to cloud management tools, software updates and a repository for containerized applications.
The Tesla P100 Housed Inside the DGX-100 Is a Monster Graphics Card
The Tesla P100 is the heart of the DGX-100 platform. Featuring the latest 5th generation Pascal architecture with 3584 CUDA Cores, 240 texture mapping units, clock speeds up to 1480 MHz and 16 GB of HBM2 VRAM (720 GB/s stream bandwidth), the DGX-1 is all prepped for the most intensive workloads pitted against it. We have already covered an extensive deal of architecture details in our article here so you definitely want to give that a read. The NVIDIA presentation was definitely enjoyable for the folks who love high performance computing products but we are very sure that NVIDIA will have products for consumers heading it in a very short time.
NVIDIA Volta Tesla V100S Specs:
|NVIDIA Tesla Graphics Card||Tesla K40|
|Tesla P100 (SXM2)||Tesla V100 (PCI-Express)||Tesla V100 (SXM2)||Tesla V100S (PCIe)|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)||GV100 (Volta)||GV100 (Volta)||GV100 (Volta)|
|Transistors||7.1 Billion||8 Billion||15.3 Billion||15.3 Billion||21.1 Billion||21.1 Billion||21.1 Billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610 mm2||815mm2||815mm2||815mm2|
|CUDA Cores Per SM||192||128||64||64||64||64||64|
|CUDA Cores (Total)||2880||3072||3584||3584||5120||5120||5120|
|FP64 CUDA Cores / SM||64||4||32||32||32||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1792||2560||2560||2560|
|Base Clock||745 MHz||948 MHz||1190 MHz||1328 MHz||1230 MHz||1297 MHz||TBD|
|Boost Clock||875 MHz||1114 MHz||1329MHz||1480 MHz||1380 MHz||1530 MHz||1601 MHz|
|FP16 Compute||N/A||N/A||18.7 TFLOPs||21.2 TFLOPs||28.0 TFLOPs||30.4 TFLOPs||32.8 TFLOPs|
|FP32 Compute||5.04 TFLOPs||6.8 TFLOPs||10.0 TFLOPs||10.6 TFLOPs||14.0 TFLOPs||15.7 TFLOPs||16.4 TFLOPs|
|FP64 Compute||1.68 TFLOPs||0.2 TFLOPs||4.7 TFLOPs||5.30 TFLOPs||7.0 TFLOPs||7.80 TFLOPs||8.2 TFLOPs|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2||4096-bit HBM2||4096-bit HBM2||4096-bit HBM|
|Memory Size||12 GB GDDR5 @ 288 GB/s||24 GB GDDR5 @ 288 GB/s||16 GB HBM2 @ 732 GB/s|
12 GB HBM2 @ 549 GB/s
|16 GB HBM2 @ 732 GB/s||16 GB HBM2 @ 900 GB/s||16 GB HBM2 @ 900 GB/s||16 GB HBM2 @ 1134 GB/s|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||4096 KB||6144 KB||6144 KB||6144 KB|