Nvidia Geforce GTX 970 Specs Updated: Only 56 ROPs and Less than 2MB L2 Cache
Nvidia has finally given a very detailed and technical explanation to PCPer on why the GTX 970 faces memory allocation issues. The reason lies in the fact that the GTX 970 technically has fewer ROPs than the GTX 980 as well as less than the full 2MB cache of its older brother. Basically, the GTX 970 only has 56 ROPs and about 1.7MB of L2 cache available to it. It can still hit the theoretical peak bandwidth of 224 GB/s but that is only while using both blocks of memory .
A portion of the official specification of the Nvidia Geforce GTX 970 is wrong
Nvidia's Jonah Alben, SVP of GPU Engineering over at Nvidia drew the following diagram for PCPer which shows the real reason why the GTX 970 behaves as it does. If you look closely you will notice exactly three things grayed out. 1) Three SMM blocks and 2) one L2 Cache block. While the SMM blocks are expected the interesting thing to note is the disabled L2/ROP cluster at the bottom left. Before we go any further here is an extract giving the rundown:
Despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer's guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. - Nvidia via PCPer
The actual problem here isn't the lowered ROP count I might add since 56 ROPs output 56 pixels/clock while as the 13SMMs output 52 pixels/clock. The bottleneck in this equation are the SMMs not the ROPs. However, the way Nvidia has configured memory fetching actually does affect performance. As you can see in the diagram the last Crossbar port actually has two memory attached to it. Under normal usage that would result in twice the amount of requests resulting in clogging under heavy use. To solve this issue, Nvidia separated the 0.5GB block from the 3.5GB one leaving it up to the OS to correctly utilize. Basically the 0.5GB block will have 1/7th the speed of the 3.5GB block although it will still be faster than memory over PCIe by about 4 times. So is the GTX 970 a 4GB card or not? Well, yes and no, it certainly has 4GB of memory that can be effectively utilized but 0.5 of that GB will be slower than the rest, in any case misquotation of technical specifications did occur. I would encourage enthusiasts to read the PCPer article in detail for a more elaborated rundown.