Nvidia's GM200 will reportedly feature only partial double precision compute performance and will be single precision optimized like GM204, its younger sibling residing in the GeForce GTX 980 and 970 gaming cards. About a month ago a similar report surfaced, which claimed that NVIDIA would not be utilizing GM200 for HPC purposes. This is mainly due to its limited FP64 capability. Instead it will focus on pursuing single precision compute performance improvements.
This report comes via 3DCenter.com who claim to have confirmed this particular tidbit of information. According to their sources, GM200 lacks specific chip-level FP64 hardware that's necessary for maintaining adequate double precision compute throughput. As a result, they claim, GM200 will be significantly down on FP64 performance compared to what we used to seeing from Nvidia's enthusiast class GPUs.
A reduced emphasis on double precision compute (FP64) performance in a compute class card marks an anomaly in Nvidia's strategy, historically speaking. This will perhaps be the first time ever that Nvidia will introduce a 500mm² GPU flagship that lacks proper FP64 compute capability.
What Nvidia's GM200 Weak Double Precision Performance Could Mean For Pro Graphics
Before going into the ramifications of this potential decision by Nvidia we must remind you again that we couldn't verify this report by 3DCenter ourselves and thus will treat it as a rumor for the time being. Now that we got that out of the way, to fully understand what this development means for consumers we must understand how FP64 compute has traditionally been added and why it's important.
You can deduce the difference between double precision floating point (FP64) and single precision floating point (FP32) from the name. FP64 results are significantly more precise than FP32. This added precision in the results is crucial for scientific research, professional applications and servers. And less so in video games. Even though FP64 is used in games in a very limited subset of functions, the bulk of video game and graphics code relies on FP32. As such this added precision in turn requires more capable hardware which would net higher costs by increasing the size of the chip while simultaneously increasing power consumption.
Double precision (FP64) compute performance has always been lower than single precision (FP32) in GPUs for that reason. Normally there's a fixed ratio between the peak single and double precision floating point capability of a given GPU. This ratio varies between different GPU architectures and different GPUs within the same architecture as well. In the latest enthusiast class chip from Nvidia the ratio between FP32 and FP64 peak performance sits at 3:1. This is true for Nvidia's GK110 GPU which powers the Quadro K6000, Titan and GTX 780/780 Ti graphics cards among others. Although the ratio has been artificially restricted on the 780 and 780 Ti cards to 16:1.
For AMD the ratio is a more aggressive 2:1 in its latest enthusiast class GPU Hawaii which powers the company's flagship FirePro W9100 and Radeon R9 290 series products. Although the ratio is artificially restricted in the 290 series to 8:1.
So, since the GTX Titan Black has a peak of 5.1 TFLOPS single precision floating point performance, a 3:1 ratio means that double precision compute goes down to 1.7 TFLOPs. And with AMD's Hawaii XT which has a peak of 5.6 TFLOPs of FP32 compute performance, a 2:1 ratio means that it will go down to a more respectable 2.8 TFLOPs of FP64 compute performance. This advantage in FP64 compute is why AMD succeeded in capturing the top spot in the Green500 list of the world's most power efficient supercomputers with it's Hawaii XT powered FirePro S9150 server graphics cards.
The FP32 to FP64 ratio in Nvidia's GM204 and GM206 Maxwell GPUs, powering the GTX 980, 970 and 960 is 32:1. Which means the GPU will be 32 times slower when dealing with FP64 intensive operations compared to FP32. As we've discussed above this is mostly OK for video games but downright unacceptable for professional applications.
If Nvidia's GM200 does end up with a similarly weak double precision compute capablity the card will have very limited uses in the professional market. However in theory the reduction of FP64 hardware resources on the chip should make it more power efficient in games and FP32 compute work. Even though I'm not entirely convinced that it's a worthwhile trade off. Especially for a card that is poised to go into the next generation Qaudro flagship compute cards.
All will become crystal clear in due time. Earlier reports suggested that Nvidia will showcase its new flagship GM200 and next generation GTX Titan X / Titan II between March 17-20. However a new announcement by the company could mean that we'll get to see the new chip in action much sooner.