NVIDIA Working on New Driver For GeForce GTX 970 To Tune Memory Allocation Problems and Improve Performance

Hassan Mujtaba
Posted Jan 28, 2015
110Shares
Share Tweet Submit

After the recent update of specifications of their GeForce GTX 970 graphics cards, it seems like NVIDIA is in pressure from their consumers who seem betrayed that they didn’t get what they payed for. While NVIDIA gave a reasonable brief explanation a few days back on why the GeForce GTX 970 has an issue allocating its entire 4 GB VRAM pool to games, they did so very late and has now been revealed that the GPU lacks some major components such as ROPs and Cache which were falsely advertised five months ago during launch.Nvidia GeForce GTX 970

NVIDIA To Fine Tune GeForce GTX 970 Memory Allocation Issues To Improve Performance

While NVIDIA is under pressure and the GTX 970 owners are flamed up, NVIDIA’s employee Peterson has accepted that his company did mess up the stats of the GeForce GTX 970 and that they will soon release a driver which will tune how the memory is allocated by the GeForce GTX 970 in gaming titles that will help  improve performance further up. It’s not known how the new GeForce driver will work but some have suggested that it may be similar to the GeForce 337.50 BETA driver by focusing towards all its optimizations towards the GeForce GTX 970 in memory bound conditions such as the 3.5 GB VRAM border after which it is reported that games start to lag or stutter. Following is the message from Peterson on GeForce forums:

Comment #1 -Hey,

First, I want you to know that I’m not just a mod, I work for NVIDIA in Santa Clara.

I totally get why so many people are upset. We messed up some of the stats on the reviewer kit and we didn’t properly explain the memory architecture. I realize a lot of you guys rely on product reviews to make purchase decisions and we let you down.

It sucks because we’re really proud of this thing. The GTX970 is an amazing card and I genuinely believe it’s the best card for the money that you can buy. We’re working on a driver update that will tune what’s allocated where in memory to further improve performance.

Having said that, I understand that this whole experience might have turned you off to the card. If you don’t want the card anymore you should return it and get a refund or exchange. If you have any problems getting that done, let me know and I’ll do my best to help.

–Peter

Comment #2 -Actually I’m not sure as that’s not a simple issue with just one cause. Card memory is not just used for the frame buffer, plenty of driver stuff gets loaded into it as well. We’re looking at sticking as much of that stuff as possible into the 0.5GB space to leave the rest available.

Comment #3 -The GTX970 really does have 4GB of memory and can access all of it. And we’re looking at ways to tweak the driver to better understand where to put stuff to make it even faster. But I totally get that it might not be the right product for your specific situation. If you really want to return it and are getting denied, let me know and I’ll do my best to help.

The issue started in late November when users started reporting in through several forums that their GTX 970’s are failing to go past 3.5 GB VRAM in certain games. After a few days, we were quick to come up with our own analysis showing that the card can utilize its 4 GB VRAM which is the total memory available on the PCB but only under highly stressful conditions. It was not until users who were breaking past the 3.5 GB VRAM started reporting another issue that their cards can go past the 3.5 GB buffer but either lag, stutter or show artifacts which made several sites initiate a testing spree with their GeForce GTX 970 cards and asking NVIDIA for a reply on this issue. NVIDIA’s first reply was that the card had a crossbar which had two pools of memory, one was the 3.5 GB VRAM which was faster and used by gaming applications while other was a 0.5 GB pool which was slow yet still faster then system memory. This optimally handled the Maxwell core arrangement that was available on the GeForce GTX 970 GM204 chip since it was disabled SKU.

NVIDIA Officially Launches 14nm Pascal GeForce GTX 1050 Ti and GTX 1050 Graphics Cards - Review Round-Up

Just two days after the initial revealing by NVIDIA and five months after the launch of GeForce GTX 970, Jonah Alben, SVP of GPU Engineering, accepted that the specs of the card weren’t what was showed initially and the specs of the card were more cut down then previously expected. Shown in a block diagram of the GTX 970 GM 204 chip, he showed that the chip had just 56 ROPs as opposed to 64 and 1792 KB L2 cache as opposed to 2048 KB L2 cache as previously advertised. He stated that it was a misunderstanding of specs handling by the NVIDIA marketing and technical team.GM204_arch_0

A quick note about the GTX 980 here: it uses a 1KB memory access stride to walk across the memory bus from left to right, able to hit all 4GB in this capacity. But the GTX 970 and its altered design has to do things differently. If you walked across the memory interface in the exact same way, over the same 4GB capacity, the 7th crossbar port would tend to always get twice as many requests as the other port (because it has two memories attached).  In the short term that could be ok due to queuing in the memory path.  But in the long term if the 7th port is fully busy, and is getting twice as many requests as the other port, then the other six must be only half busy, to match with the 2:1 ratio.  So the overall bandwidth would be roughly half of peak. This would cause dramatic underutilization and would prevent optimal performance and efficiency for the GPU.

Let’s be blunt here: access to the 0.5GB of memory, on its own and in a vacuum, would occur at 1/7th of the speed of the 3.5GB pool of memory. If you look at the Nai benchmarks floating around, this is what you are seeing.To avert this, NVIDIA divided the memory into two pools, a 3.5GB pool which maps to seven of the DRAMs and a 0.5GB pool which maps to the eighth DRAM.  The larger, primary pool is given priority and is then accessed in the expected 1-2-3-4-5-6-7-1-2-3-4-5-6-7 pattern, with equal request rates on each crossbar port, so bandwidth is balanced and can be maximized. And since the vast majority of gaming situations occur well under the 3.5GB memory size this determination makes perfect sense. It is those instances where memory above 3.5GB needs to be accessed where things get more interesting.

*To those wondering how peak bandwidth would remain at 224 GB/s despite the division of memory controllers on the GTX 970, Alben stated that it can reach that speed only when memory is being accessed in both pools. via PCPER

The reason for this cut down was that the last two 0.5 DRAM had to be connected to two 32-bit memory controllers which however were situated across just one L2 cache module. This resulted in vigorous sharing and handling for just one L2 cache since the other had to be disabled. To use the last memory block effectively, NVIDIA had to separate a single piece of DRAM hence converting the card into a 3.5 GB model which only used the last section of the VRAM when it was really needed. NVIDIA is now going to fine tune the performance of this specific block and how it manages resource sharing across a set of applications.

Watch Dogs 2 PC Performance Benchmarks - Results Show NVIDIA GPUs in Lead Over AMD, GeForce Performance Guide For PC Users Published

It’s not known when the new driver launches but looking at the amount of heat they are currently getting, the new drivers will be launched next month.

Petition Launched Against NVIDIA To Refund GTX 970 Cards: A petition has since been launched asking NVIDIA to refund the users who were falsely advertised with wrong specifications at launch. It has gained almost 3000 supporters and the numbers are closing each passing moment. You can sign the petition by heading to the link below:

https://www.change.org/p/nvidia-refund-for-gtx-970

[socialpoll id=”2249878″]

NVIDIA GeForce GTX 970 and GTX 980 Specifications:

GeForce GTX 970 (Initial) GeForce GTX 970 (Corrected) GeForce GTX 980
Codename GM204 GM204 GM204
Process 28nm 28nm 28nm
GPU Core Maxwell Maxwell Maxwell
SM Units 13 x 128 13 x 128 16 x 128
CUDA Cores 1664 1664 2048
ROPS 64 56 64
TMUs 104 104 128
L2 Cache 2048KB 1792KB 2048KB
Core Clock 1051 MHz 1051 MHz 1126 MHz
Boost Clock 1178 MHz 1178 MHz 1216 MHz
Memory 4 GB GDDR5 4 GB GDDR5 4 GB GDDR5
Memory Bus 256-bit 256-bit 256-bit
Memory Clock 7.0 GHz 7.0 GHz 7.0 GHz
Memory Bandwidth 224.5 GB/s 224.5 GB/s 224.5 GB/s
Texture Fill Rate GT/s 145.0 145.0 172.8
TDP 148W 148W 165W
Power Connectors 6+6 Pin 6+6 Pin 6+6 Pin
DirectX 12 Support Yes Yes Yes
Launch 18th September 2014 18th September 2014 18th September 2014
Price $329 Reference
$329+ Custom
$329 Reference
$329+ Custom
$549 Reference
$549+ Custom

Share Tweet Submit