Nvidia Begins Seeding Reviewers With the New TITAN-X with Pascal – Synthetic CuDNN Benchmarks Show Up To 200% Performance Increase

Author Photo
Jul 30, 2016
36Shares
Submit

Synthetic benchmarks for the Nvidia TITAN X have leaked out on Chiphell (via Videocardz) and show a very significant performance increase over the older generation Geforce GTX TITAN X. This of course also means that Nvidia has started sampling reviewers with units (or at least the initial batch has begun shipping). The brand new TITAN X has 3584 CUDA cores and clocked at 1531 Mhz (Boost) offers a significant hardware upgrade over its older brother which boasted just 3072 CUDA cores and a measly clock speed of 1075 Mhz (Boost).

The new TITAN X starts shipping to select reviewers – offers anywhere from a 63% to a 200% speedup with CuDNN5

Since Nvidia has deliberately dropped the Geforce GTX branding from the new TITAN X, instead branding it as a ‘prosumer’ card, synthetic benches that focus on CuDNN are pretty relevant for this product. The new TITAN X offers a speedup of up to 2x in some cases – which is pretty huge. Keep in mind however, that the new TITAN X is using a newer implementation of CuDNN so the speedup is not simply hw gains. That said, gains from version differences are usually quite small in nature so the higher clock speed as well as the increased amount of cores can be safely said to account for the vast majority of the speedup.

On paper the new Titan has 16% more cores and a 42% higher clock rate. This means that right off the bat (and not accounting for any improved CuDNN library optimization and or architectural gains) you are looking at a raw compute increase of 66%. If you subtract that figure from the speedup gained, you will arrive at the true architectural gains (Maxwell to Pascal) and the gains from using CuDNN5 instead of CuDNN4. In some cases, these true gains are significant in some cases however, they are actually a bit under what we would expect the bare minimum speedup to be.

According to the benchmarks you are looking at a speedup of between 74% to 91% on Alexnet, 76% to 200% on OverFeat, 74% to 884% on Inception and 91% to 98% on VGG. This means that in compute work that involves deep neural nets the new TITAN X will offer a significant speedup. It has better power efficiency as well. However, when we talk about the cost, the MSRP is going to be 1200 dollars and considering TITANs rarely sell at MSRP (the old $1000 MSRP Titan is retailing in the $1700-1900 range on Amazon and Newegg) the price might give a potential customer pause. Also considering we don’t know anything about DP just yet, it would make more sense for the prosumer market to pursue more value oriented purchases,like lets say, multiple GTX 1070s.

NVIDIA GeForce 10 Pascal Family

Graphics Card Name NVIDIA GeForce GTX 1050 2 GB NVIDIA GeForce GTX 1050 3 GB NVIDIA GeForce GTX 1050 Ti NVIDIA GeForce GTX 1060 3 GB NVIDIA GeForce GTX 1060 5 GB NVIDIA GeForce GTX 1060 6 GB NVIDIA GeForce GTX 1070 NVIDIA GeForce GTX 1070 Ti NVIDIA GeForce GTX 1080 NVIDIA Titan X NVIDIA GeForce GTX 1080 Ti NVIDIA Titan Xp
Graphics Core GP107 GP107 GP107 GP106 / GP104 GP106 GP106 / GP104 GP104 GP104 GP104 GP102 GP102 GP102
Process Node 14nm FinFET 14nm FinFET 14nm FinFET 16nm FinFET 16nm FinFET 16nm FinFET 16nm FinFET 16nm FinFET 16nm FinFET 16nm FinFET 16nm FinFET 16nm FinFET
Die Size 132mm2 132mm2 132mm2 200mm2 200mm2 200mm2 314mm2 314mm2 314mm2 471mm2 471mm2 471mm2
Transistors 3.3 Billion 3.3 Billion 3.3 Billion 4.4 Billion 4.4 Billion 4.4 Billion 7.2 Billion 7.2 Billion 7.2 Billion 12 Billion 12 Billion 12 Billion
CUDA Cores 640 CUDA Cores 768 CUDA Cores 768 CUDA Cores 1152 CUDA Cores 1280 CUDA Cores 1280 CUDA Cores 1920 CUDA Cores 2432 CUDA Cores 2560 CUDA Cores 3584 CUDA Cores 3584 CUDA Cores 3840 CUDA Cores
Base Clock 1354 MHz 1392 MHz 1290 MHz 1506 MHz 1506 MHz 1506 MHz 1506 MHz 1607 MHz 1607 MHz 1417 MHz 1480 MHz 1480 MHz
Boost Clock 1455 MHz 1518 MHz 1392 MHz 1708 MHz 1708 MHz 1708 MHz 1683 MHz 1683 MHz 1733 MHz 1530 MHz 1583 MHz 1582
FP32 Compute 1.8 TFLOPs 2,3 TFLOPs 2.1 TFLOPs 4.0 TFLOPs 4.4 TFLOPs 4.4 TFLOPs 6.5 TFLOPs 8.1 TFLOPs 9.0 TFLOPs 11 TFLOPs 11.5 TFLOPs 12.5 TFLOPs
VRAM 2 GB GDDR5 3 GB GDDR5 4 GB GDDR5 3 GB GDDR5 6 GB GDDR5 6 GB GDDR5/X 8 GB GDDR5/X 8 GB GDDR5 8 GB GDDR5X 12 GB GDDR5X 11 GB GDDR5X 12 GB GDDR5X
Memory Speed 7 Gbps 7 Gbps 7 Gbps 8 Gbps 8 Gbps 9 Gbps / 10 Gbps 8 Gbps 8 Gbps 11 Gbps 10 Gbps 11 Gbps 11.4 Gbps
Memory Bandwidth 112 GB/s 84 GB/s 112 GB/s 192 GB/s 160 GB/s 224 GB/s / 240 GB/s 256 GB/s 256 GB/s 352 GB/s 480 GB/s 484 GB/s 547 GB/s
Bus Interface 128-bit bus 96-bit bus 128-bit bus 192-bit bus 160-bit bus 192-bit bus 256-bit bus 256-bit bus 256-bit bus 384-bit bus 352-bit bus 384-bit bus
Power Connector None None None Single 6-Pin Power Single 6-Pin Power Single 6-Pin Power Single 8-Pin Power Single 8-Pin Power Single 8-Pin Power 8+6 Pin Power 8+6 Pin Power 8+6 Pin Power
TDP 75W 75W 75W 120W 120W 120W 150W 180W 180W 250W 250W 250W
Display Outputs 1x Display Port 1.4
1x HDMI 2.0b
1x DVI
1x Display Port 1.4
1x HDMI 2.0b
1x DVI
1x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
1x DVI
3x Display Port 1.4
1x HDMI 2.0b
3x Display Port 1.4
1x HDMI 2.0b
Launch Date October 2016 May 2018 October 2016 September 2016 August 2018 July 2016 June 2016 October 2017 May 2016 August 2016 March 2017 April 2017
Launch Price $109 US $119 US-$129 US $139 US $199 US TBD $249 US $349 US $449 US $499 US $1200 US $699 US $1200 US

 

Submit