NVIDIA Volta Tesla V100 GPU Accelerator Compute Performance Revealed – Features A Monumental Increase Over Pascal Based Tesla P100

• Sep 17, 2017 at 04:36am EDT

NVIDIA's flagship and the fastest graphics accelerator in the world, the Volta GPU based Tesla V100 is now shipping to customers around the globe. The new GPU is a marvel of engineering and it has a broad range of technologies such as the latest 12nm process, NVLINK 2.0, HBM 2.0, Tensor Cores and a highly efficient architecture design that make it the most suitable chip for heavy compute or AI (Deep Learning) workloads.

NVIDIA Volta GV100 GPU Based Tesla V100 Benchmarked - A Monumental Performance Increase in Geekbench Compute Test Over The Pascal GP100 Based Tesla P100

Released just a year after the Pascal based Tesla P100, the Volta based Tesla P100 bests its predecessor in every possible way. And just like its predecessor, the flagship is designed to head over to the deep learning and compute markets. At GTC 2017, we got to learn almost everything about the Volta GV100 GPU but now, we have got the first independent test results and they are a shocker.

Tested in Geekbench 4, the system used was an NVIDIA DGX-1. The DGX-1 is what NVIDIA calls a supercomputer inside a box. It's a powerful machine that manages to deliver some astonishing performance results. As per official claims, the total horsepower on the DGX-1 has been boosted from 170 TFLOPs of FP16 compute to 960 TFLOPs of FP16 compute which is a direct effect of the new Tensor cores that are featured inside the Volta GV100 GPU core.

In terms of specifications, this machine rocks eight Tesla V100 GPUs with 5120 cores each. This totals 40,960 CUDA Cores and 5120 Tensor Cores. The DGX-1 houses a total of 128 GB of HBM2 memory on its eight Tesla V100 GPUs. The system features dual Intel Xeon E5-2698 V4 processors that come with 20 cores, 40 threads and clock in at 2.2 GHz. There’s 512 GB of DDR4 memory inside the system. The storage is provided in the form of four 1.92 TB SSDs configured in RAID 0, network is a dual 10 GbE with up to 4 IB EDR. The system comes with a 3.2 KW PSU. You can find more details here.

Now comes the part where we unveil the results. The NVIDIA DGX-1 currently features the fastest compute performance on the Geekbench 4 database. There's no setup in sight that can dethrone this beast. The system can be compared to a HP Z8 G4 Workstation which features a total of nine PCIe slots and features a score of 278706 points in the OpenCL API with the Quadro GP100 which is essentially a Tesla P100 spec'd card. Moving over to the fastest Tesla P100 listing, we see a total of 8 PCIe cards configured to reach a score of 320031 in the CUDA API. But let's take a look at the mind boggling Tesla V100 scores. A DGX-1 system with 8 SXM2 Tesla V100 cards scores 418504 in OpenCL API and a monumental 743537 points with the CUDA API.

The score puts the Tesla V100 in an impressive lead over its predecessor which is something we are excited to see. It also shows that we can be looking at a generational leap in the gaming GPU segment if the performance numbers from the chip architecture carry over to the mainstream markets. Another thing which should be pointed out is the incredible tuning of compute output with the new CUDA API and related libraries. Not only is the Tesla V100 seeing big improvements over OpenCL but the same can be seen for the Tesla P100 which means that NVIDIA is really doing some hard work with their CUDNN framework and it's expected to get even better in the coming generations. So there you have it, NVIDIA's fastest GPU showing off some killer performance in its specified compute related workloads.

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla Graphics Card	Tesla K40 (PCI-Express)	Tesla M40 (PCI-Express)	Tesla P100 (PCI-Express)	Tesla P100 (SXM2)	Tesla V100 (PCI-Express)	Tesla V100 (SXM2)	Tesla V100S (PCIe)
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GP100 (Pascal)	GV100 (Volta)	GV100 (Volta)	GV100 (Volta)
Process Node	28nm	28nm	16nm	16nm	12nm	12nm	12nm
Transistors	7.1 Billion	8 Billion	15.3 Billion	15.3 Billion	21.1 Billion	21.1 Billion	21.1 Billion
GPU Die Size	551 mm2	601 mm2	610 mm2	610 mm2	815mm2	815mm2	815mm2
SMs	15	24	56	56	80	80	80
TPCs	15	24	28	28	40	40	40
CUDA Cores Per SM	192	128	64	64	64	64	64
CUDA Cores (Total)	2880	3072	3584	3584	5120	5120	5120
Texture Units	240	192	224	224	320	320	320
FP64 CUDA Cores / SM	64	4	32	32	32	32	32
FP64 CUDA Cores / GPU	960	96	1792	1792	2560	2560	2560
Base Clock	745 MHz	948 MHz	1190 MHz	1328 MHz	1230 MHz	1297 MHz	TBD
Boost Clock	875 MHz	1114 MHz	1329MHz	1480 MHz	1380 MHz	1530 MHz	1601 MHz
FP16 Compute	N/A	N/A	18.7 TFLOPs	21.2 TFLOPs	28.0 TFLOPs	30.4 TFLOPs	32.8 TFLOPs
FP32 Compute	5.04 TFLOPs	6.8 TFLOPs	10.0 TFLOPs	10.6 TFLOPs	14.0 TFLOPs	15.7 TFLOPs	16.4 TFLOPs
FP64 Compute	1.68 TFLOPs	0.2 TFLOPs	4.7 TFLOPs	5.30 TFLOPs	7.0 TFLOPs	7.80 TFLOPs	8.2 TFLOPs
Memory Interface	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM
Memory Size	12 GB GDDR5 @ 288 GB/s	24 GB GDDR5 @ 288 GB/s	16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s	16 GB HBM2 @ 732 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 1134 GB/s
L2 Cache Size	1536 KB	3072 KB	4096 KB	4096 KB	6144 KB	6144 KB	6144 KB
TDP	235W	250W	250W	300W	250W	300W	250W

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Volta Tesla V100 GPU Accelerator Compute Performance Revealed – Features A Monumental Increase Over Pascal Based Tesla P100

NVIDIA Volta Tesla V100 GPU Accelerator Compute Performance Revealed – Features A Monumental Increase Over Pascal Based Tesla P100

NVIDIA Volta GV100 GPU Based Tesla V100 Benchmarked - A Monumental Performance Increase in Geekbench Compute Test Over The Pascal GP100 Based Tesla P100

NVIDIA Volta Tesla V100S Specs:

Trending Stories

Lip-Bu Tan Nearly Walked Away From Semiconductors, But One Plea to ‘Save Intel’ Pulled Him Back as CEO, Now Hiring Top CPU/GPU Architects

Valve Steam Machine Benchmarks Show Near Twice The Uplift Over Steam Deck & Comparable To Ryzen 5 5600X at 30W

Black Myth: Wukong Outpaces FromSoftware’s Elden Ring to 30 Million Sales Nearly Two Years Ahead of Schedule

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

Lara Croft’s PlayStation 1 Debut Returns as a Modern Remake, but the Demo Proved Exploration Still Beats the Gunplay

Popular Discussions

AMD’s Marketing Chief Boasts ’15 Out Of 15′ On Amazon’s Best-Seller CPU Chart, Leaving Intel Without A Single Top Spot

AMD Olympic Ridge “Zen 6” Ryzen CPUs Get Integrated NPU At The Cost of iGPU, CUDIMM Ready Platform

Intel’s Z990 Chipset Goes All-In On Gen5, Shrinking Its Die 22% While Pushing Power Up To 14W

AMD’s RX 9070 XT Finally Crashes Steam Survey At 1.33% Share, Closing The Gap On NVIDIA’s RTX 5080 After A Year In Hiding

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

NVIDIA Volta Tesla V100 GPU Accelerator Compute Performance Revealed – Features A Monumental Increase Over Pascal Based Tesla P100

NVIDIA Volta GV100 GPU Based Tesla V100 Benchmarked - A Monumental Performance Increase in Geekbench Compute Test Over The Pascal GP100 Based Tesla P100

Related Story The World’s Top Cloud Providers Are Now Getting NVIDIA’s Vera Rubin NVL72, The World’s Fastest AI Platform

NVIDIA Volta Tesla V100S Specs:

Further Reading

Trending Stories

Popular Discussions