[Review] NVIDIA’s Titan X

Posted Mar 21, 2015
49Shares
Share Tweet Submit

Big Maxwell, GM200, has been one of the most anticipated GPU’s from NVIDIA in a very long time. The prospect of having so very many Maxwell cores at our disposal to be able to run games at the highest possible detail levels and at the highest possible resolutions is indeed quite tantalizing. Oh so very tantalizing. Big Maxwell and the Titan X represents the absolute fastest technology ever produced by NVIDIA, with theoretical performance on par with that of the Titan Z, a dual GK110 part.

 GeForce GTX Titan X_Side

 

Big Maxwell

Sure there has been a lot of hype surrounding nearly all the launches of GPU’s that make large technological leaps. Kepler was great, it even arrived into the world mostly keeping the promises that hype kept. Big Kepler? Yeah, we all got excited about GK110 alright, it gave us some fantastic performance numbers, in compute related tasks too. You mean we can have our single precision cake and eat our double precision cake too? Finally NVIDIA! But then came Big Maxwell. The rumors were coming at us fast and furious. Lots of CUDA cores, 12GB RAM and all on the ubiquitous and mature 28nm process from TSMC we’ve all come to know and love. It sounded perfect, absolutely perfect. There is, of course, the giant purple and pink polka dotted elephant in the room; the $999 price of entry. But we’ll talk about that near the end. First, let’s explore what it can do.

Then some performance numbers started getting leaked, nothing real-world, mind you, but 3D Mark is at least something to measure it against the competition. And it was good. Over 6 Tflops of single-precision computing power to tackle all your needs. Gaming needs, of course. Unfortunately, the FP64 performance has been a bit neutered.

The Titan X makes use of the absolutely marvelous second generation Maxwell, the GM200-400-A1 core chip, which is much larger in size than the previous GM204 and GM206 chips. Instead of the more pedestrian CUDA core count of 2048, the GM200 ups that to 3072 CUDA cores that have 192 texture units and 96 raster devices at its disposal. It’s also paired with 12GB of GDDR5 VRAM that runs at 7GHZ over a 384-bit memory bus. There are 8 billion transistors packed into a 601mm2 area.

Those 3072 cores are split up amongst six separate Graphics Processing Clusters that each have four separate SMM units that consist of 128 cores each. All interconnected together and sitting on 12GB of RAM with 336GB/s of memory bandwidth. All of that translates roughy into a great gaming experience that’s offered quite nicely within the 250W TDP that they promise. Very close in fact.

Block Diagram 

GM200 is an interesting chip design coming from NVIDIA. It isn’t what GK110 was by any means, it doesn’t have hidden or sequestered transistors that point towards more components to be leveraged like GK110 did. There isn’t anything in the 8 billion transistors that isn’t being utilized. Essentially, Titan X is almost literally a GTX 980 with 50% more SMX’s.

NVIDIA GeForce GTX Titan X Specifications:

NVIDIA GeForce GTX Titan X NVIDIA GeForce GTX Titan Black NVIDIA GeForce GTX 980 NVIDIA GeForce GTX 970 NVIDIA GeForce GTX 960
GPU Architecture Maxwell Kepler Maxwell Maxwell Maxwell
GPU Name GM200 GK110 GM204 GM204 GM206
Die Size 601mm2 561mm2 398mm2 398mm2 228mm2
Process 28nm 28nm 28nm 28nm 28nm
CUDA Cores 3072 2880 2048 1664 1024
Texture Units 192 240 128 104 64
Raster Devices 96 48 64 64 32
Clock Speed 1002 MHz 889 MHz 1126 MHz 1051 MHz 1127 MHz
Boost Clock 1089 MHz 980 MHz 1216 MHz 1178 MHz 1178 MHz
VRAM 12 GB GDDR5 6 GB GDDR5 4 GB GDDR5 4 GB GDDR5 2 GB GDDR5
Memory Bus 384-bit 384-bit 256-bit 256-bit 128-bit
Memory Clock 7.0 GHz 7.0 GHz 7.0 GHz 7.0 GHz 7.0 GHz
Memory Bandwidth 336.0 GB/s 336.0 GB/s 224.0 GB/s 224.0 GB/s 112.0 GB/s
TDP 250W 250W 165W 145W 120W
Power Connectors 8+6 Pin 8+6 Pin Two 6-Pin Two 6-Pin One 6-Pin
Price $999 US $999 US $549 US $329 US $199 US

Eff Pee Sixty.. What?

Does the FP64 performance really matter? Not necessarily, and of course that depends on the industry you want to use it in too. Oddly, AMD generally offers good native FP64 performance, making those good choices for compute heavy workloads. But all gaming and most commercial resources likely rely on FP32 or lower precision, only scientific workloads really make use of anything higher. So for the average joe or jane? You’re fine and this bad boy will be more than enough.

GM200 used all of it’s space to cater to the FP32 workloads, making it a purely gaming-centric device. Having so little space dedicated to FP64 registers means that the Quadro products will be just as limited, and that there is likely not to be a Tesla part made from GM200. This is a gaming GPU through and through. But guess what? It delivers the goods, and it delivers them with lots of cake and pie to boot!

Of course the original Titan, Titan Black and Titan Z had non-neutered FP64 leading it to have a “semi-professional” status amongst some. Doing so would have vastly increased the size of the chip, increased the cost of both manufacture and of retail price, and it would have been far more power hungry and hot than it is. Perhaps on 20nm or below will we see a non FP64 neutered chip. But until then, this thing is blazing fast for just about anything. So get over yourself already.

I really have to reiterate this for anyone who might somehow think that the lack of FP64 means that the chip is somehow poorly made, a cop-out or just plain broken compared to AMD. FP64 means nothing to the gaming crowd. It also means nothing to the majority of the commercial crowd. The only reason you need FP64 is in some scientific applications that absolutely need to have that double precision. Sure, it looks cool to have that spec, but it’s meaningless for 99% of us. And even some very well put together scientific distributed computing project applications only make use of single precision, so it’s still useless to have FP64. If you need it, then don’t buy this, and you already know who you are anyway, and won’t be looking at the Titan X for that.

 Another use that NVIDIA see’s this card being used for is that of the perfect card to be paired with that nifty new VR headset from Valve, Oculus or even Razer’s open source model. The massive 12GB of RAM make it possible to not only process large amounts of data at a time and keep it in memory, not to mention the plethora of CUDA cores, but perhaps some head tracking data can be offloaded to the GPU to help keep things running a bit more smoothly. Not all of the GPU is always going to be completely utilized, so why not add in some OpenCL or CUDA calls to help with other tasks in a game. This may have the overhead when used in conjunction with some titles.

Oh it’s good alright.

To test this beast I’ve compiled a great list of benchmarks for you. First of all are the comparative benchmarks that will be used to compare against the Titan X’s direct and indirect competitors. Then I have a long list of other benchmarks that are fairly all inclusive list with both old and new games. I even have the original Crysis in there. These list frame rates from the Titan X only and are for your enjoyment only. Feel free to report your own numbers from odd games and other applications in the comments, we’d love to see what you’ve done with your own. Lastly I have a list of compute benchmarks to round it off. I’ve used distributed computing projects as well as more mainstream benchmarks that can be used as a point of comparison.

 0T4A0408

Test Setup

My test bench is a bit more pedestrian, something to represent the majority rather than the minority. Unfortunately the Titan X is most definitely not the majority. That’s okay though, because that just means that we’re depicting what one could do to save costs to afford a Titan X while still maintaining a good PC.

 

CPU Intel Xeon 1230V3 @ 3.3GHz
Motherboard Gigabyte Z97N Wi-Fi Mini-Itx
Power Supply XFX 1250W Pro Black Edition
Hard Disk SanDisk Extreme II 120GB
Storage Disk Seagate 2TB
Memory Crucial Ballistix Tactical Tracer 8GB (4GBx2) DDR3 1866
Monitor BenQ BL2710PT 27″ WQHD
Video Cards Geforce Titan X, Geforce GTX 980 Reference, Geforce GTX 970 Reference, AMD R9 295X2
Drivers NVIDIA 347.84 Beta, AMD Catalyst 14.2
Operation System Windows 8.1 Pro

For all the tests MSAA was set to X2 to even the playing field. Battlefield 4 consisted of a play through of a 64 player server on the Siege of Shanghai level. Crysis 3’s benchmark was done by playing through the first level. Fraps was used to capture the framerates of Battlefield 4 and Crysis 3. The rest use the internal benchmarking services that were available.

Benchmarks

 Battlefield 4

BF4
Crysis 3

Crysis3
 Dragon Age Inquisition

Dragon
Middle-Earth: Shadow of Mordor

Middleearth
Civilization Beyond Earth

Civ
Compilation

For the compilation I’ve used many games from our past and some present, but perhaps not common, games to highlight the performance capable of this beast. Some games had a frame rate cap regardless of vsync and were thus useless, though other titles allowed me to remove the cap, so I’ve done so with those titles.

I want to introduce the play through style of benchmarking. Benchmarking a GPU in the context of gaming is not scientific nor should it be treated as such. Playing games presents differing and challenging situations for the GPU to render that constantly change. Simply running through a static benchmark isn’t indicative of how it will run in the real-world at all. Sure, it stresses the components and tells you how well it runs the engine, but how does it handle the random events that can come into play with a real play through? That’s why I play through a level and capture the information to display to you. I’ll certainly provide benchmarks that cater to those looking to systematically compare information, but I also want you to be informed of how it handles a game in a more realistic way.

All games consisted of a play through of a level, typically the first level offered. Fraps was used to capture the framerate information. More games will be added to the list as I have time and the resources to play them.

All of these games were run at the highest possible settings at WQHD resolution.

Now we’re on to the compute benchmarks. As you can almost plainly see. The Titan X provides a gaming experience that rivals even the R9 295X2 at 1440P. It certainly gives it a run, though not really for it’s money due to the Titan being more expensive.

But I digress. Now let’s see how that high throughput single precision does in the real world. I’ve selected three distributed computing tests from Einstein@home, POEM@home and Primegrid@home. All three tests were selected due to them being updated frequently and even recently enough to take in account Maxwell’s architecture. I’ve also included the benches form a novel little benchmark known as ViennaBench. ViennaCL is a a linear algebra library that runs in OpenCL and even CUDA. The creator, Karl Rupp, has even added a nifty benchmark that measures performance in a number of different mathematical tests.

Time is in seconds, faster is better.

ViennaCL is in GB/s and GFLOPs, more is better here.

POEM@home

Poem
Einstein@home

Einstein

Primegrid

ViennaCL

This can be run in both single and double precision mode, and you can certainly see the lack of FP64 performance here too.

Temperatures

The unfortunate thing about cramming so many transistors into such a small area on the same process node is the increase in heat generation across the entire core. Does this mean it runs hot? No! But it is hotter than your GTX 980, and this could limit the overclocking potential as well. Unfortunately I don’t have professional audio monitoring hardware, so can’t tell you how quiet it is, except that it isn’t annoying in the slightest.

Now what

All the benchmarks in the world are fantastic to look at. Now we know about the chip, how it works and what it can do. But what about that multi-colored polka dotted elephant in the room? That price is absolutely enormous. $999 is a lot of dough to slap down for a GPU that performs as good, though sometimes better, than a dual GPU card from Team Red that costs less. But in the context of NVIDIA’s current inventory, the Titan X does indeed cost as much if not less than an SLI solution of GTX 980’s, and performs nearly as well, though using less power.

So what does one do in this type of dilemma? Certainly the gamer with disposable money that wants the absolute best in performance should look at NVIDIA’s best solution, the Titan X. But does it fit within the confines of a more modest system limited by its other components? My fairly inexpensive Xeon 1230V3 actually seems to be able to feed it fast enough to make it worth it, perhaps. I can’t truly tell you whether or not it’s a great and amazing buy at $999. The Titan X is expensive, and NVIDIA seems to be able to price the card to reflect the lack of real competition for it at the moment. So very expensive!

What I can tell you, however, is how enjoyable it was to be able to play with it and use it. It’s fast alright, and it makes old games play well just as it enables higher settings in newer games. Is the Titan X worth that $999 price of entry? Not quite. Even $800 would be far more enticing. In fact, I would give this a perfect 10 if that were the case. But it’s just too expensive to justify for the average gamer. But for the average gamer this ain’t. If you need or want the best, then of course this is worth it!

I dare you to buy one. It’s fantastic!

Overclocking and SLI performance will be in another upcoming article. Expect to see some even better performance numbers when two of these babies are put together.

Share on Facebook Share on Twitter Share on Reddit