AMD’s Vega Will Double Your Usable Graphics Memory Capacity With Its Clever New High Bandwidth Cache


Last week has been one of the more exciting for AMD and its fans as it took the covers off its highly anticipated Vega graphics architecture. Luckily we were there and got to see it in action and in the flesh. Yes, we got to see the actual Vega graphics card that ran all of those impressive 4K game demos. On top of that the company also gave us its first comprehensive overview of the Vega architecture and its new features and technologies.

Perhaps the most intriguing and exciting of all the new bells and whistles that Vega brings to the table is its unique memory architecture and High Bandwidth Cache. The new memory architecture allows Vega GPUs to do a number of exciting new things that its predecessors can't. One of its features in particular is impressive enough to warrant having its own discussion.

AMD Launching RX Vega 32, 28 & A Dozen New Vega 11 Cards, GPU Passes Certification

Besides handling memory traffic in a vastly more efficient fashion it also significantly cuts back on wasteful memory allocations. We go into a lot of details on how it works and why it's quite revolutionary in our Vega graphics architecture piece, where we break it all down. We're not going to dive into the details here, so if you want to read more about it we'd highly recommend checking out that article.

AMD's New Vega High Bandwidth Cache Controller Will Double Your Usable Graphics Memory Capacity In Games

Vega High Bandwidth Cache Controller

That's right. An 8GB Vega graphics card, just as an example, will be effectively have as much usable memory as a 16GB graphics card. It's all thanks to the company's brand new High Bandwidth Cache Controller at the heart of every Vega graphics chip and the way it works is quite clever. And who is better to explain it all than AMD's top graphics man and beloved nerd Raja Koduri.

Raja Koduri – Chief Architect Radeon Technologies Group, AMD
With regards to the High Bandwidth Cache from a gaming perspective. We looked at all the modern games, the big games that push memory hard, and one of the things we noticed is the VRAM - graphics memory - utilization. We look at how much of the VRAM that the game allocates. So if the game say needs 4GB of memory when we looked at actually how much of that memory is actually used to render pixels we found that many games, actually most games, don't use more than 50% of what they allocate.

That's because the current/old GPU architecture doesn't give you flexibility to move memory in fine granularity. So with Vega and with the High Bandwidth Cache and the HBC controller, for games it will utilize the amount of frame-buffer you have much more efficiently. So effectively you can think of it as Vega will be doubling your memory capacity for games.

Brad Chacos – Senior Editor,PC World
So basically a game that says it uses 4GB of VRAM right now, is in actuality using 2 and with Vega, you're saying, it will actually allocate 2.

Raja Koduri – Chief Architect Radeon Technologies Group, AMD

Wccftech.com transcript, PCWorld interview. Video Timestamp 1:57

Vega's High Bandwidth Cache In Action - Lower Memory Utilization & Faster Multi-Tasking

AMD gave us two examples of the High Bandwidth Cache Controller cutting back on wasteful memory allocations by half. To our surprise they were both triple A gaming titles where developers have actually done a lot of optimization work to minimize the memory utilization footprint. The games in question are The Witcher 3 from CD Projekt Red and Fallout 4 from Bethesda Game Studios.

The Radeon Technologies Group found that in most titles today, including the two above, only half of all the memory allocated is actually accessed and used. Raja explains that this is the result of game developers working around the quirks of old GPU architectures, where swapping data in and out of the frame-buffer is very expensive in terms of latency/performance. This in turn would force game developers to guard themselves by allocating more than they need at any given time to avoid running into a situation where the game needs to swap in data from outside the graphics memory.

AMD Vega 11 GPUs Entering Production, Vega 20 Coming On 7nm

With Vega the High Bandwidth Cache Controller is clever enough to know beforehand what data is actually useful and load it into the cache and what data isn't and leave it out. Which would not only cut the amount of memory allocated by games in half it would also make things like alt-tabbing out of games significantly faster, because the frame buffer isn't clogged up with all of these wasteful data allocations.

Brad Chacos – Senior Editor,PC World
On day one, will games that already exist consume less memory?

Raja Koduri – Chief Architect Radeon Technologies Group, AMD
For example say a game is built for 4GB and say you have a 4GB card it all plays well but when you swap in, for example you alt-tab out of the game and go into a browser or something or do something quick and you come back, it takes a long time. Because the whole thing was swapped out and swapped in.

So with Vega you will see that stuff become much more efficient. Because it didn't really.. like I said it wasn't using all 4GB it was only using a portion of it. So we didn't actually load that up all inside your precious cache. So you will see those kinds of benefits. But, let's say you have a game that wants to push 8GB when you turn high details on and so on, it will run much more efficiently in a 4GB configuration.

Wccftech.com transcript, PCWorld interview. Video Timestamp 9:02

AMD's next generation family of Radeon graphics cards featuring the Vega graphics architecture will officially launch in the first half of this year. The company hasn't given us a specific date yet but promises to reveal more in the coming weeks and months.

AMD Vega Lineup

Graphics CardRadeon R9 Fury XRadeon RX 480Radeon RX Vega Frontier EditionRadeon Vega ProRadeon RX Vega (Gaming)Radeon RX Vega Pro Duo
GPUFiji XTPolaris 10Vega 10Vega 10Vega 102x Vega 10
Process Node28nm14nm FinFETFinFETFinFETFinFETFinFET
Stream Processors40962304409635844096 (?)Up to 8192
Performance8.6 TFLOPS
8.6 (FP16) TFLOPS
5.8 (FP16) TFLOPS
~25 (FP16) TFLOPS
22 (FP16) TFLOPS
>25 (FP16) TFLOPS
Memory Bus4096-bit256-bit2048-bit2048-bit2048-bit4096-bit
Launch20152016June 2017June 2017July 2017TBA