NVIDIA Tesla V100s Volta Based Graphics Card Features Higher GPU Clocks For Over 16 TFLOPs Compute, Over 1 TB/s Memory Bandwdith
NVIDIA has released a new variant of its Volta-based Tesla graphics card known as the Tesla V100S. The new server aimed solution carries the same specifications of the full Volta GPU, but offers much faster clock frequencies for both GPU and memory, driving its performance beyond 16 TFLOPs in single-precision compute workloads.
NVIDIA Tesla V100S Volta GPU Brings 16+ TFLOPs and Over 1 TB/s Memory Bandwidth To Servers
In terms of configuration, the Tesla V100S has the same GV100 GPU which is based on the 12nm FinFET process node. The specifications include 5120 CUDA cores, 640 Tensor cores and 32 GB of HBM2 memory. As you can tell, these are very similar specifications to the existing Tesla V100, but there are some significant changes made to both the GPU and memory clock speeds.
The Tesla V100S only comes in the PCIe form factor, but delivers higher clocks than the 300W Tesla V100 SMX2 (NVLINK) solution. It comes with a GPU clock speed of 1601 MHz compared to 1533 MHz on the SMX2 variant and also offers higher 1.1 Gbps frequencies for the HBM2 DRAM. The combined boost to memory and graphics clocks make this Tesla variant the fastest HPC & server aimed graphics solution.
At its above-mentioned clock speeds, the Tesla V100S is able to deliver a theoretical FP32 compute performance 16.4 TFLOPs, FP64 compute performance of 8.2 TFLOPs and DNN/DL compute of 130 TFLOPs. The card also pumps out over 1 Terabyte of memory bandwidth (1134 GB/s) versus 900GB per second bandwidth of the Tesla V100. The Tesla V100S comes in a 250W design and has a higher compute performance than AMD's Radeon Instinct MI60 which is based on the 7nm Vega 20 GPU architecture, but delivers a maximum FP32 compute performance of 14.75 TFLOPs with a TDP of 300W.
NVIDIA Volta Tesla V100S Specs:
|NVIDIA Tesla Graphics Card||Tesla K40|
|Tesla P100 (SXM2)||Tesla V100 (PCI-Express)||Tesla V100 (SXM2)||Tesla V100S (PCIe)|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)||GV100 (Volta)||GV100 (Volta)||GV100 (Volta)|
|Transistors||7.1 Billion||8 Billion||15.3 Billion||15.3 Billion||21.1 Billion||21.1 Billion||21.1 Billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610 mm2||815mm2||815mm2||815mm2|
|CUDA Cores Per SM||192||128||64||64||64||64||64|
|CUDA Cores (Total)||2880||3072||3584||3584||5120||5120||5120|
|FP64 CUDA Cores / SM||64||4||32||32||32||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1792||2560||2560||2560|
|Base Clock||745 MHz||948 MHz||1190 MHz||1328 MHz||1230 MHz||1297 MHz||TBD|
|Boost Clock||875 MHz||1114 MHz||1329MHz||1480 MHz||1380 MHz||1530 MHz||1601 MHz|
|FP16 Compute||N/A||N/A||18.7 TFLOPs||21.2 TFLOPs||28.0 TFLOPs||30.4 TFLOPs||32.8 TFLOPs|
|FP32 Compute||5.04 TFLOPs||6.8 TFLOPs||10.0 TFLOPs||10.6 TFLOPs||14.0 TFLOPs||15.7 TFLOPs||16.4 TFLOPs|
|FP64 Compute||1.68 TFLOPs||0.2 TFLOPs||4.7 TFLOPs||5.30 TFLOPs||7.0 TFLOPs||7.80 TFLOPs||8.2 TFLOPs|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2||4096-bit HBM2||4096-bit HBM2||4096-bit HBM|
|Memory Size||12 GB GDDR5 @ 288 GB/s||24 GB GDDR5 @ 288 GB/s||16 GB HBM2 @ 732 GB/s|
12 GB HBM2 @ 549 GB/s
|16 GB HBM2 @ 732 GB/s||16 GB HBM2 @ 900 GB/s||16 GB HBM2 @ 900 GB/s||16 GB HBM2 @ 1134 GB/s|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||4096 KB||6144 KB||6144 KB||6144 KB|
There's around a 17% increase in compute performance to be had from the Tesla V100S when you compare it with the Tesla V100 PCIe. That's a nice increase and the server audience would see that as a reason to upgrade. The only thing to consider here is the AMD Instinct parts feature PCIe Gen 4.0 compatibility and with many major server players moving over to the PCIe 4.0 platforms in 2020, NVIDIA needs to work on their own PCIe Gen 4.0 implementation, which I believe is where their Ampere GPUs come in. There's currently no word on the pricing or availability of the Tesla V100S, but expect it to cost over $6000 US.