NVIDIA has released a new variant of its Volta-based Telsa graphics card known as the Tesla V100S. The new server aimed solution carries the same specifications of the full Volta GPU but offers a lot faster clock frequencies for both GPU and memory, driving its performance beyond 16 TFLOPs in single-precision compute workloads.

NVIDIA Tesla V100S Volta GPU Brings 16+ TFLOPs and Over 1 TB/s Memory Bandwidth To Servers

In terms of configuration, the Tesla V100S has the same GV100 GPU which is based on the 12nm FinFET process node. The specifications include 5120 CUDA cores, 640 Tensor cores and 32 GB of HBM2 memory. As you can tell, these are very similar specifications to the existing Tesla V100 but there are some significant changes made to both, the GPU and memory clock speeds.

The Tesla V100S only comes in PCIe form factor but delivers higher clocks than the 300W Tesla V100 SMX2 (NVLINK) solution. It comes with a GPU clock speed of 1601 MHz compared to 1533 MHz on the SMX2 variant and also offers higher 1.1 Gbps frequencies for the HBM2 DRAM. The combined boost to memory and graphics clocks make this Tesla variant the fastest HPC & server aimed graphics solution.

At its above-mentioned clock speeds, the Tesla V100S is able to deliver a theoretical FP32 compute performance 16.4 TFLOPs, FP64 compute performance of 8.2 TFLOPs and DNN/DL compute of 130 TFLOPs. The card also pumps out over 1 Terabyte of memory bandwidth (1134 GB/s) versus 900GB per second bandwidth of the Tesla V100. The Tesla V100S comes in a 250W design & has a higher compute performance than AMD's Radeon Instinct MI60 which is based on the 7nm Vega 20 GPU architecture but delivers a maximum FP32 compute performance of 14.75 TFLOPs with a TDP of 300W.

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla Graphics Card Tesla K40

(PCI-Express) Tesla M40

(PCI-Express) Tesla P100

(PCI-Express) Tesla P100 (SXM2) Tesla V100 (PCI-Express) Tesla V100 (SXM2) Tesla V100S (PCIe) GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal) GV100 (Volta) GV100 (Volta) GV100 (Volta) Process Node 28nm 28nm 16nm 16nm 12nm 12nm 12nm Transistors 7.1 Billion 8 Billion 15.3 Billion 15.3 Billion 21.1 Billion 21.1 Billion 21.1 Billion GPU Die Size 551 mm2 601 mm2 610 mm2 610 mm2 815mm2 815mm2 815mm2 SMs 15 24 56 56 80 80 80 TPCs 15 24 28 28 40 40 40 CUDA Cores Per SM 192 128 64 64 64 64 64 CUDA Cores (Total) 2880 3072 3584 3584 5120 5120 5120 Texture Units 240 192 224 224 320 320 320 FP64 CUDA Cores / SM 64 4 32 32 32 32 32 FP64 CUDA Cores / GPU 960 96 1792 1792 2560 2560 2560 Base Clock 745 MHz 948 MHz 1190 MHz 1328 MHz 1230 MHz 1297 MHz TBD Boost Clock 875 MHz 1114 MHz 1329MHz 1480 MHz 1380 MHz 1530 MHz 1601 MHz FP16 Compute N/A N/A 18.7 TFLOPs 21.2 TFLOPs 28.0 TFLOPs 30.4 TFLOPs 32.8 TFLOPs FP32 Compute 5.04 TFLOPs 6.8 TFLOPs 10.0 TFLOPs 10.6 TFLOPs 14.0 TFLOPs 15.7 TFLOPs 16.4 TFLOPs FP64 Compute 1.68 TFLOPs 0.2 TFLOPs 4.7 TFLOPs 5.30 TFLOPs 7.0 TFLOPs 7.80 TFLOPs 8.2 TFLOPs Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM Memory Size 12 GB GDDR5 @ 288 GB/s 24 GB GDDR5 @ 288 GB/s 16 GB HBM2 @ 732 GB/s

12 GB HBM2 @ 549 GB/s 16 GB HBM2 @ 732 GB/s 16 GB HBM2 @ 900 GB/s 16 GB HBM2 @ 900 GB/s 16 GB HBM2 @ 1134 GB/s L2 Cache Size 1536 KB 3072 KB 4096 KB 4096 KB 6144 KB 6144 KB 6144 KB TDP 235W 250W 250W 300W 250W 300W 250W

There's around a 17% increase in compute performance to be had from the Tesla V100S when you compare it with the Tesla V100 PCIe. That's a nice increase and the server audience would see that as a reason to upgrade. The only thing to consider here is the AMD Instinct parts feature PCIe Gen 4.0 compatibility and with many major server players moving over the PCIe 4.0 platforms in 2020, NVIDIA needs to work on their own PCIe Gen 4.0 implementation and I believe that is where their Ampere GPUs come in. There's currently no word on the pricing or availability of the Tesla V100S but expect it to cost over $6000 US.