Alongside the insanely powerful Volta GV100 GPU, NVIDIA has also announced the next iteration of DGX-1 and HGX-1 supercomputing stations designed to power AI, Deep Learning and Neural Networking work loads.
NVIDIA Volta GV100 GPU Upgrades DGX-1 and HGX-1 Supercomputers With Irresponsible Amounts of Power
The Volta GV100 GPU will be powering three systems that are designed by NVIDIA, the DGX-1V, HGX-1V and the DGX Station. All three systems make use of multiple Tesla V100 graphics cards and will be aimed at a variety of users that include research specialists, cloud computing and personal computing. All of the systems carry insane amounts of power and very powerful specifications that come at a great cost.

And now, Jensen announces NVIDIA DGX-1 with eight Telsa v100. It’s labeled on the slide as the “essential instrument of AI research. What used to take a week now takes a shift. It replaces 400 servers. It offers 960 tensor TFLOPS. It will ship in Q3. It will cost $149,000. He notes that if you get one now powered by Pascal, you’ll get a free upgrade to Volta.
Turns out, there’s also a small version of DGX-1, DGXX Station. Think of it as a personal sized one. It’s liquid cooled and whisper quiet. Every one of our deep learning engineers has one.
It has four Tesla V100s. It’s $69K. Order it now and we’ll deliver it in Q3. “So place your order now,” he avers. via NVIDIA
NVIDIA DGX/HGX Supercomputers
NVIDIA Volta GV100 GPU Based DGX-1 Supercomputer For AI Research - $149,000 US Price
So first up, we have the NVIDIA DGX-1 which is a direct successor of the Pascal based DGX-1. This time, we are looking at 8 Tesla V100 GPUs instead of 8 Tesla P100 GPUs. The total horsepower of this machine has been boosted from 170 TFLOPs of FP16 compute to 960 TFLOPs of FP16 compute which is a direct effect of the new Tensor cores that are featured inside the Volta GV100 GPU core.

In terms of specifications, this machine rocks eight Tesla V100 GPUs with 5120 Cores each. This totals to 40,960 CUDA Cores and 5120 Tensor Cores. The DGX-1 houses a total of 128 GB of HBM2 memory on its eight Tesla V100 GPUs. The system features dual Intel Xeon E5-2698 V4 processors that come with 20 cores, 40 threads and clock in at 2.2 GHz. There's 512 GB of DDR4 memory inside the system. The storage is provided in the form of four 1.92 TB SSDs configured in RAID 0, network is a dual 10 GbE with up to 4 IB EDR. The system comes with a 3.2 KW PSU.

The system is designed to access to today’s most popular deep learning frameworks, NVIDIA DIGITS deep learning training application, third-party accelerated solutions, the NVIDIA Deep Learning SDK (e.g. cuDNN, cuBLAS), CUDA toolkit, fast multi-GPU collectives NCCL, NVIDIA Docker and NVIDIA drivers. In terms of performance, the DGX-1 with Tesla V100 has a 10X speed up over PCI Express thanks to the new NVLINK 2.0 interconnect that is rated at 300 GB/s. The Deep Learning training time has sped up by a factor of 3x over the Pascal GP100 based system. The NVIDIA DGX-1 will cost $149,000 US and will be available in Q3 2017.

NVIDIA Volta GV100 GPU Based DGX Station - A Liquid Cooled Computing Powerhouse - $69,000 US Price
NVIDIA is also announcing a new Volta based system known as DGX station. This thing is similar to the Digits Devbox but comes with a modified specs list. The system is designed for personal power consumption and features 480 TFLOPs of FP16 performance. This is 3x the performance for deep learning training compared with today's fastest GPU workstations. It also features 5x increase in the overall I/O performance over PCI-Express based systems. The total computing capacity of this workstation is equivalent to 400 CPUs which is impressive.

Specifications include four NVIDIA Tesla V100 GPUs which utilize the PCIe form factor. There's a total of 20,480 CUDA Cores inside the system and an additional 2560 Tensor Cores. There's 64 GB of HBM2 VRAM inside the system. Other specs include a Xeon E5-2698 V4 CPU, 256 GB of LRDIMM DDR4 system memory and four 1.92 TB SSDs of which three are configured in RAID 0 while the remaining one is configured for the OS. The total system power requirement is 1500W. The system is entirely liquid cooled for excellent cooling performance under full working load.
Greater Deep Learning Performance in a Personal Supercomputer The new NVIDIA DGX Station is the world's first personal supercomputer for AI development, with the computing capacity of 400 CPUs, consuming nearly 40x less power, in a form factor that fits neatly deskside.
Engineered for peak performance and deskside comfort, the DGX Station is the world's quietest workstation, drawing one-tenth the noise as other deep learning workstations. Data scientists can use it for compute-intensive AI exploration, including training deep neural networks, inferencing and advanced analytics. Via NVIDIA
NVIDIA Volta GV100 GPU Based HGX-1 Supercomputer For Cloud Computing
NVIDIA also has a cloud computing option known as the HGX-1 which will be upgraded with the Volta Tesla V100 GPUs. The system also comes with 8 Tesla V100 GPUs configured with NVLINK Hybrid Cube interconnect. The platform is mainly aimed at Cloud Computing for GRID Graphics, CUDA HPC Stacks, NVIDIA Deep Learning stack.

That's a whole bunch of announcements made by NVIDIA. With a launch suggested around Q3 2017, we are sure going to see these HPC, workstation and datacenter aimed machines in action.
NVIDIA DGX-1 and DGX Station Data Sheets:
NVIDIA Volta Tesla V100S Specs:
NVIDIA Tesla Graphics Card | Tesla K40 (PCI-Express) | Tesla M40 (PCI-Express) | Tesla P100 (PCI-Express) | Tesla P100 (SXM2) | Tesla V100 (PCI-Express) | Tesla V100 (SXM2) | Tesla V100S (PCIe) |
---|---|---|---|---|---|---|---|
GPU | GK110 (Kepler) | GM200 (Maxwell) | GP100 (Pascal) | GP100 (Pascal) | GV100 (Volta) | GV100 (Volta) | GV100 (Volta) |
Process Node | 28nm | 28nm | 16nm | 16nm | 12nm | 12nm | 12nm |
Transistors | 7.1 Billion | 8 Billion | 15.3 Billion | 15.3 Billion | 21.1 Billion | 21.1 Billion | 21.1 Billion |
GPU Die Size | 551 mm2 | 601 mm2 | 610 mm2 | 610 mm2 | 815mm2 | 815mm2 | 815mm2 |
SMs | 15 | 24 | 56 | 56 | 80 | 80 | 80 |
TPCs | 15 | 24 | 28 | 28 | 40 | 40 | 40 |
CUDA Cores Per SM | 192 | 128 | 64 | 64 | 64 | 64 | 64 |
CUDA Cores (Total) | 2880 | 3072 | 3584 | 3584 | 5120 | 5120 | 5120 |
Texture Units | 240 | 192 | 224 | 224 | 320 | 320 | 320 |
FP64 CUDA Cores / SM | 64 | 4 | 32 | 32 | 32 | 32 | 32 |
FP64 CUDA Cores / GPU | 960 | 96 | 1792 | 1792 | 2560 | 2560 | 2560 |
Base Clock | 745 MHz | 948 MHz | 1190 MHz | 1328 MHz | 1230 MHz | 1297 MHz | TBD |
Boost Clock | 875 MHz | 1114 MHz | 1329MHz | 1480 MHz | 1380 MHz | 1530 MHz | 1601 MHz |
FP16 Compute | N/A | N/A | 18.7 TFLOPs | 21.2 TFLOPs | 28.0 TFLOPs | 30.4 TFLOPs | 32.8 TFLOPs |
FP32 Compute | 5.04 TFLOPs | 6.8 TFLOPs | 10.0 TFLOPs | 10.6 TFLOPs | 14.0 TFLOPs | 15.7 TFLOPs | 16.4 TFLOPs |
FP64 Compute | 1.68 TFLOPs | 0.2 TFLOPs | 4.7 TFLOPs | 5.30 TFLOPs | 7.0 TFLOPs | 7.80 TFLOPs | 8.2 TFLOPs |
Memory Interface | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM |
Memory Size | 12 GB GDDR5 @ 288 GB/s | 24 GB GDDR5 @ 288 GB/s | 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s | 16 GB HBM2 @ 732 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 1134 GB/s |
L2 Cache Size | 1536 KB | 3072 KB | 4096 KB | 4096 KB | 6144 KB | 6144 KB | 6144 KB |
TDP | 235W | 250W | 250W | 300W | 250W | 300W | 250W |