Nvidia Geforce and AMD Radeon Graphic Cards Memory Analysis
Memory has always been one of the more important components of a graphics card. Its importance, in recent times has become even more prominent – particularly thanks to the advent of HBM technology. While the focus of the industry has shifted to that of maximizing bandwidth while retaining an acceptable physical footprint – there lurks in the shadows a menace that could turn out to have far reaching consequences: power consumption of memory. I thought it was time that we do a short editorial on the effects of bandwidth and power consumption.
The untold story of memory, bandwidth and power consumption in modern graphic cards
As most of you will know, the memory standard for graphic cards of the past generation is GDDR5, and while many future offerings will undoubtedly retain this storage type, there is a new player in town: High Bandwidth Memory. HBM or stacked DRAM (which comes in many different names and flavors) is a recent innovation - with significantly increased bandwidth but drastically lowered power consumption. The entire point of HBM was to bring high bandwidth to an affordable level (in terms of power cost). And while they have pretty much succeeded in doing that for the near future, the problem persists when talking about the long term horizon. Nvidia had their GPU Technology Theater conference at SC15 a while back and Dr. Stephen W. Keckler, Senior Director of Architectural Research (at Nvidia) mentioned that the problem of the memory power consumption remains unsolved - even with HBM.
The graph shows an exponentially increasing trend in power consumption of the two memory types - as bandwidth increases. We can see the structural break in the graph where the industry shifts from GDDR5 to HBM, but the end result remains the same - albeit delayed. Now why is this a problem? one might ask. In terms of the TDP enthusiast rigs are used to - these numbers are well within range. But you have to realize that a GPU which would require so much bandwidth would consume quite a lot on its own too - and the combined TDP will not be within acceptable ranges. The energy consumed by a GPU is a very important number, and if just the memory requires such a large amount of power than the total number would be pretty huge.
The power of graphic cards is increasing at an unrelenting pace which means that within a few more years we will require bandwidth in the region of a few thousand GB/s - which would mean that even with the energy efficiency afforded by stacked DRAM, memory power consumption will rise above the 100W mark - which considering the fact that the GPU themselves will be sipping quite alot power on their own - means total wattage above the 250W mark high end cards currently employ.
Some of our readers reading this will probably be wondering why we need bandwidth in the thousands of GB/s. Well, you have to remember the fact that graphic cards are actually processors that have huge clusters of incredibly small cores. You can call them CUDA cores, or you can call them stream processors - either way, it is these tiny processors that require the bandwidth. Now while the number of these cores has been increasing at a very steady rate, the bandwidth afforded to them has not been able to keep pace. To elaborate the statement, we calculated the bandwidth available per core of recent Nvidia and AMD graphic cards. A very clear trend is immediately visible.
Bandwidth per core analysis of Nvidia and AMD graphic cards
Bandwidth per core of different Radeons, with the number of stream processors in ascending order.
Bandwidth per core of various Geforce cards, with the number of CUDA cores sorted in ascending order.
You will notice that the bandwidth available to each core decreases as the GPU grows more powerful in nature. This is because of the inherently slow pace of current memory technology. Interestingly, we can see that AMD has been able to keep the gradient of the Bandwidth Per Core relatively constant while as Nvidia has just a slightly steeper gradient. Nvidia's card range in the interval of 107MB/s to 120 MB/s per core going as high as 160 MB/s per core on the GTX 760. This is ofcourse something we have known all along - AMD is more generous with allotting bandwidth to its graphic cards than Nvidia - which uses color optimization technology to compensate for the lack of bandwidth. AMD cards range from 110 MB/s to 140 MB/s per core, going as high as 175 MB/s per core for the R9 370. Interestingly, the only card in our calculations which dropped below the 100 MB/s per core mark is the R9 285 - which has a number of 98 MB/s per core.
Lets explore the power dilemma a bit more. The power consumption of volatile memory is a function of the clock speed and bus width (aka bandwidth). According to a statement by AMD, the latest version of GDDR5 (on average) uses about 1 watt per 10.66 GB/s of bandwidth whileas HBM is good for about 35 GB/s per watt. This is a phenomenal, nearly 3.5x, increase in power efficiency - but is it enough? We computed the approximate power consumption of the various graphic cards and arrived at some very interesting conclusions.
Memory power consumption trend of AMD and Nvidia graphic cards
As expected with Nvidia, the power consumed by the memory is increasing at an approximately proportional rate to the average power of the graphic card. Keep in mind that cores across generations are not completely comparable (the average Maxwell core is 15%-32% more powerful depending on the card) so the actual number (after adjusting for performance differences) would be slightly higher. We see that on average an AMD card uses around 15W to 36W to power its GDDR5 memory standard whileas Nvidia cards use around 18W-32W depending on the exact card. With GDDR5 - the maximum amount of wattage so far is near the 40W mark (after accounting for real world line losses). With HBM however, things take an altogether different turn. HBM memory uses only 15Ws and provides more bandwidth than the highest clocked GDDR5. Infact if you look at the graph you will notice that in the case of the Fury lineup, the curve breaks from its expected path and actually slopes downwards - indicating the structural break in trend we talked about.
By sticking to the HBM standard we can compute what the bandwidth will be like at the power levels stated by Nvidia. At 120W an HBM (at today's efficiency) will be able to output 4200 GB/s - which is an absolutely huge number. Unfortunately however, efficiency does not scale linearly and we might as well be looking at a factor of 20-25 GB/s per watt at thorough puts that high. Which gives us a number around 2400 GB/s - a much more reasonable number which we can expect to see in the near future. Currently, the power consumption of the memory ranges anywhere from 8-15% of the total TDP of a GPU but as the bandwidth increases - this number will go up as well. Ofcourse, there are many alternative technologies already in the pipeline - including standards being worked on by Intel, Rambus and Micron. This projection does not (and cannot) take into account disruptive new technologies which reset the curve once more - a possibility that is more than likely.
Nvidia Geforce - Memory Analysis
|Model||CUDA Cores||Bandwidth||Bandwidth per Core||Memory TDP||Total TDP||% of TDP|
|GeForce GTX 760||1152||192.3 GB/s||0.1669 GBpc||18W||170W||10.6%|
|GeForce GTX 760 Ti||1344||192.3 GB/s||0.1431 GBpc||18W||170W||10.6%|
|GeForce GTX 770||1536||224 GB/s||0.1458 GBpc||21W||230W||9.1%|
|GeForce GTX 780||2304||288.4 GB/s||0.1252 GBpc||27W||250W||10.8%|
|GeForce GTX 780 Ti||2880||336.5 GB/s||0.1168 GBpc||32W||250W||12.6%|
|GeForce GTX Titan||2688||288.4 GB/s||0.1073 GBpc||27W||250W||10.8%|
|GeForce GTX Titan Black||2880||336.5 GB/s||0.1168 GBpc||32W||250W||12.6%|
|GeForce GTX Titan Z||5760||336.5 GB/s||0.0584 GBpc||32W||375W||8.4%|
|GeForce GTX 750||512||80 GB/s||0.1563 GBpc||8W||55W||13.6%|
|GeForce GTX 750 Ti||640||88 GB/s||0.1375 GBpc||8W||60W||13.8%|
|GeForce GTX 950||768||106 GB/s||0.1380 GBpc||10W||90W||11.0%|
|GeForce GTX 960||1024||112 GB/s||0.1094 GBpc||11W||120W||8.8%|
|GeForce GTX 970||1664||196 GB/s + 28 GB/s||0.1178 GBpc||21W||145W||14.5%|
|GeForce GTX 980||2048||224 GB/s||0.1094 GBpc||21W||165W||12.7%|
|GeForce GTX 980 Ti||2816||336 GB/s||0.1193 GBpc||32W||250W||12.6%|
|GeForce GTX Titan X||3072||336 GB/s||0.1094 GBpc||32W||250W||12.6%|
AMD Radeon - Memory Analysis
|Model||Stream Processors||Bandwidth||Bandwidth per Core||Memory Power||TDP||% of TDP|
|Radeon R9 270X||1280||179.2 GB/s||0.1400 GBpc||17W||180W||9.3%|
|Radeon R9 280||1792||240 GB/s||0.1339 GBpc||23W||250W||9.0%|
|Radeon R9 280X||2048||288 GB/s||0.1406 GBpc||27W||250W||10.8%|
|Radeon R9 285||1792||176 GB/s||0.0982 GBpc||17W||190W||8.7%|
|Radeon R9 290||2560||320 GB/s||0.1250 GBpc||30W||250W||12.0%|
|Radeon R9 290X||2816||320 GB/s||0.1136 GBpc||30W||250W||12.0%|
|Radeon R9 295X2||5632||640 GB/s||0.1136 GBpc||60W||500W||12.0%|
|Radeon R7 360||768||104 GB/s||0.1354 GBpc||10W||100W||9.8%|
|Radeon R7 370||1024||179.2 GB/s||0.1750 GBpc||17W||110W||15.3%|
|Radeon R9 370X||1280||179.2 GB/s||0.1400 GBpc||17W||180W||9.3%|
|Radeon R9 380||1792||182.4 GB/s||0.1018 GBpc||17W||190W||9.0%|
|Radeon R9 390||2560||384 GB/s||0.1500 GBpc||36W||275W||13.1%|
|Radeon R9 390X||2816||384 GB/s||0.1364 GBpc||36W||275W||13.1%|
|Radeon R9 Fury||3584||512 GB/s||0.1429 GBpc||15W||275W||5.3%|
|Radeon R9 Fury X||4096||512 GB/s||0.1250 GBpc||15W||275W||5.3%|
|Radeon R9 Nano||4096||512 GB/s||0.1250 GBpc||15W||175W||8.4%|