NVIDIA Volta Tesla V100 Cards Detailed – 150W Single-Slot & 300W Dual-Slot GV100 Powered PCIe Accelerators

• May 10, 2017 at 04:49pm EDT

NVIDIA announced today two next generation cards based on its Volta graphics architecture and GV100 GPU. The new Tesla V100 accelerators will come in two different PCIe form factors, a 150W single-slot full height, half length design and a standard 300W dual-slot design. Both designs will house NVIDIA's next generation GV100 GPU featuring 5120 Volta CUDA cores with 16GB of HBM2.

NVIDIA Telsa V100 Accelerator - 150W Single-Slot and 300W Dual-Slot PCIe Cards

The GV100 Volta GPU that sits at the heart of each of these upcoming Tesla accelerators is a massive 815mm² chip with over 21 billion transistors built on TSMC's new 12nm FinFET manufacturing process. At 1455MHz the Tesla V100 delivers 15 TFLOPS of single precision compute and 7.5 TFLOPS of double precision compute at 300W. It's worthy of note that just like the P100, the V100 does not feature a fully unlocked GPU. The GV100 GPU houses 5376 CUDA cores but only 5120 are functional in the Tesla V100.

Tesla Product	Tesla K40	Tesla M40	Tesla P100	Tesla V100
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GV100 (Volta)
SMs	15	24	56	80
TPCs	15	24	28	40
FP32 Cores / SM	192	128	64	64
FP32 Cores / GPU	2880	3072	3584	5120
FP64 Cores / SM	64	4	32	32
FP64 Cores / GPU	960	96	1792	2560
Tensor Cores / SM	NA	NA	NA	8
Tensor Cores / GPU	NA	NA	NA	640
GPU Boost Clock	810/875 MHz	1114 MHz	1480 MHz	1455 MHz
Peak FP32 TFLOP/s^*	5.04	6.8	10.6	15
Peak FP64 TFLOP/s^*	1.68	2.1	5.3	7.5
Peak Tensor Core TFLOP/s^*	NA	NA	NA	120
Texture Units	240	192	224	320
Memory Interface	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2	4096-bit HBM2
Memory Size	Up to 12 GB	Up to 24 GB	16 GB	16 GB
L2 Cache Size	1536 KB	3072 KB	4096 KB	6144 KB
Shared Memory Size / SM	16 KB/32 KB/48 KB	96 KB	64 KB	Configurable up to 96 KB
Register File Size / SM	256 KB	256 KB	256 KB	256KB
Register File Size / GPU	3840 KB	6144 KB	14336 KB	20480 KB
TDP	235 Watts	250 Watts	300 Watts	300 Watts
Transistors	7.1 billion	8 billion	15.3 billion	21.1 billion
GPU Die Size	551 mm²	601 mm²	610 mm²	815 mm²
Manufacturing Process	28 nm	28 nm	16 nm FinFET+	12 nm FFN

For hyperscale datacenters NVIDIA has managed to cram that same 815mm² GV100 GPU into a card the size of a CD case. At half the power the 150W hyperscale Tesla V100 naturally won't be as fast as its 300W bigger brother but it's close. How close? NVIDIA isn't disclosing that information just yet.

tesla-v100-full-height-half-length-hyperscale-card-150w-3

tesla-v100-full-height-half-length-hyperscale-card-150w-4

NVIDIA's Volta Architecture & The GV100 GPU

NVIDIA's new Volta architecture manages to deliver 40% better performance/watt compared to Pascal and houses 7% more CUDA cores/mm² and 6% better performance/mm². This is thanks to a combination of the more efficient and higher density 12nm FinFET process as well as due to architectural refinements of the original Pascal architecture.

Each Volta SM -- Streaming Multiprocessor -- still houses 64 CUDA cores just like Pascal. However, volta features a slightly different SM partitioning. While in Pascal each SM was partitioned into two blocks, in Volta each SM is partitioned into four blocks. each with 16 FP32 cores, 8 FP64 cores, 16 INT32 cores and two brand new cores called Tensor cores.

This is another area where GV100 differs from GP100. Each Volta GV100 SM includes separate FP32 and INT32 cores which can simultaneously execute FP32 and INT32 operations at full throughput. Whilst GP100 only featured FP32 cores which were capable of executing either FP32 or INT32 operations at any given time.

Tensor cores are mixed precision FP32/FP16 4x4 arrays. Each array is able to accelerate the execution of what NVIDIA calls Tensor operations by a factor of 6 compared to traditional FP64 cores. This allows Volta to deliver 6x higher inferencing throughput per clock compared to Pascal and 12x the deep-learning throughput per clock.

The key architectural improvements from Pascal to Volta include :

New mixed-precision FP16/FP32 Tensor Cores purpose-built for deep learning matrix arithmetic;
Enhanced L1 data cache for higher performance and lower latency;
Streamlined instruction set for simpler decoding and reduced instruction latencies;
Higher clocks and higher power efficiency.

About the author: PC hardware & tech evangelist. Been building PCs for over a decade & following the industry for just as long. Also a doctor specializing in Preventive Medicine.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Volta Tesla V100 Cards Detailed – 150W Single-Slot & 300W Dual-Slot GV100 Powered PCIe Accelerators

NVIDIA Volta Tesla V100 Cards Detailed – 150W Single-Slot & 300W Dual-Slot GV100 Powered PCIe Accelerators

NVIDIA Telsa V100 Accelerator - 150W Single-Slot and 300W Dual-Slot PCIe Cards

NVIDIA's Volta Architecture & The GV100 GPU

Trending Stories

Trump Mobile’s PR Firm Just Walked Away From The Embattled T1 Phone, Indicating Just How Precarious The Optics Have Become

CD Projekt RED CEO Admits Cyberpunk 2077’s Redemption Arc Isn’t Complete, but Hopes The Witcher 4 Wins Back Fans

ASUS Announces $16,578 Price Tag for ROG 20th Anniversary Family Bucket

Standard Snapdragon 8 Elite Gen 6 To Share One Similarity With Snapdragon 8 Elite Gen 5, But That Won’t Make It Any Less Expensive

Guild Wars Expands into Card Games with Mistbound, as ArenaNet Bets That a 5×3 Tactical Grid Redefines the CCG Genre

Popular Discussions

AMD Olympic Ridge “Zen 6” Ryzen CPUs Get Integrated NPU At The Cost of iGPU, CUDIMM Ready Platform

Apple To Design & Build Chips At Intel on American Soil, US President Confirms

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

AMD Reportedly Plots Another 10-15% RX 9000 Price Hike As The RAMpocalypse Swallows The GPU Market

AMD’s Next-Gen Threadripper “Mustang Peak” Confirmed: Built For TR6 Platform, Bringing 2nm Zen 6 Cores and PCIe Gen6

NVIDIA Volta Tesla V100 Cards Detailed – 150W Single-Slot & 300W Dual-Slot GV100 Powered PCIe Accelerators

NVIDIA Telsa V100 Accelerator - 150W Single-Slot and 300W Dual-Slot PCIe Cards

Related Story NVIDIA GTX 1170 Alleged Benchmark Leaked, Faster Than 1080 Ti

NVIDIA's Volta Architecture & The GV100 GPU

Further Reading

Trending Stories

Popular Discussions