AMD Radeon RX Vega GPU Architecture Detailed – NCUs With RPM, High-Bandwdith Cache Controller Delivers +50% Avg and +100% Min FPS in Gaming, Built For Modern APIs


AMD has revealed several new architectural details of their next-generation Radeon RX Vega GPU at the Capsaicin and Cream event. The details showcase how AMD will be utilizing the several brand new features of Ryzen to deliver faster performance along with higher graphics fidelity in VR and AAA gaming titles.

Brand New AMD Radeon RX Vega GPU Architecture Details Revealed - Features Aimed at AAA 4K Gaming and VR Titles

AMD detailed several new features that will be packed inside the brand new Vega GPUs that will be integrated into enthusiast class, Radeon RX Vega graphics cards. Some of these features include NCUs (Next Generation Compute Units) High-Bandwidth Cache Controller, Rapid Packed Math, Radeon Virtualized Encode and fine-tuned, multi-GPU support on modern APIs such as DirectX 12 and Vulkan.

AMD RDNA 3 ‘GFX11’ GPUs May Feature Hardware-Accelerated FSR 3.0 Tech Thanks To WMMA ‘Wave Matrix-Multiply-Accumulate’ Instructions

AMD High-Bandwidth Cache Controller - Higher Memory Utilization

AMD's new High-Bandwidth Cache Controller or HBCC for short was already detailed in our Vega GPU architecture deep dive, over here. At the Capsaicin and Cream event, AMD showed how the new cache architecture will benefit gamers by allowing developers to utilize 100% of the memory and cache systems on board Vega GPUs. Developers and Gamers would be able to utilize the entirety of the memory systems that they have payed for, resulting in higher performance in gaming titles.

The demonstration AMD showed revealed that the new HBCC system would deliver gains of up to 50% in average framerates and up to 100% in minimum framerates in AAA titles that are solely optimized for Radeon RX Vega graphics chips.

AMD Rapid Packed Math Through Flexible Next-Generation Compute Units

AMD is also packing their most flexible core structure inside Vega GPUs which will be known as NCU or Next-Generation Compute Units. These will allow AMD to deliver a feature known as Rapid Packed Math (RPM) which accelerates FP16 compute calculations, allowing for faster mathematical processing.

AMD Next To Get Hacked? RansomHouse Extortion Group Claims To Have Stolen 450 Gb Worth of Data

The new shaders will be able to leverage Rapid Packed Math. In a TressFX feature tech demo shown by AMD, Vega was able to double the number of hair strands which not only increased graphics fidelity by 2x but did so at the same performance cost. Rapid Packed Math essentially doubles the rate of compute allowing for faster physics and compute calculations on Vega class GPUs.

AMD Radeon Virtualized Encode - Cloud Servers For Streaming, Powered by Vega GPUs

AMD also made an announcement of Vega GPUs powering cloud servers. This was shown as a collaboration with LiquidSky who will be offering cloud computing and gaming services to over 1.5 million users. A demo was used to illustrate the capabilities of Vega by running Battlefield 1 (being powered off a Vega server over cloud) at 1080p.

This technology will bring the capabilities and performance of Vega GPUs to over a million gamers. The whole concept sounds a lot like NVIDIA's GeForce Now streaming service which uses their Pascal GP100 GPUs to power cloud servers. No cost of subscription or availability to avail of the cloud computing service was announced but more details will be released when Radeon RX Vega GPUs are officially launched in Q2 2017.

AMD Radeon RX Vega GPU MultiView Rendering and Multi-Res Rendering Technologies

AMD MultiRes Rendering is an AMD LiquidVR feature designed to reduce unnecessary processing by imitating how we see. This technique enables high visual quality where it matters which can reduce GPU processing, helping improve performance, reduce dropped frames and rendering latency.

AMD MultiView Rendering is an AMD LiquidVR feature designed to improve CPU and GPU performance in VR by eliminating redundant processing. MultiView Rendering reduces the number of duplicated object draw calls, helping reduce dropped frames and rendering latency.

What To Expect From AMD Radeon Vega Graphics Cards?

AMD’s high-end Vega 10 GPU will be available to consumers in the first half of 2017. The chip spans a die size of over 500mm2 from early calculations and features two HBM2 stacks, incorporating up to 16 GB of HBM2. The consumer variant which was demonstrated using DOOM and Star Wars: Battlefront featured 8 GB of HBM2 VRAM. The specific device ID for the consumer variant is 687F:C1.

The graphics chip will be utilizing the latest 14nm GFX9 core architecture which is based on the NCU (Next Compute Engine) design. The graphics card will feature 64 Compute Units or 4096 stream processors. AMD plans on increasing the throughput of the chip through increased clock speeds. This will allow AMD to pump numbers better than the Fiji GPU which is based on the 28nm GCN 3.0 architecture and comes with the same number of cores, 4096 SPs.

The server part with the full chip is expected to feature a TDP of 225W with clock speeds around 1526 MHz. A consumer oriented graphics card can feature even higher clock speeds since server parts generally lack the cooling capabilities of consumer level cards which ship with better coolers and PCBs designed to allow overclocking of the GPUs. You can read our Vega GPU architecture preview over here.

AMD Vega Lineup

Graphics CardRadeon R9 Fury XRadeon RX 480Radeon RX Vega Frontier EditionRadeon Vega ProRadeon RX Vega (Gaming)Radeon RX Vega Pro Duo
GPUFiji XTPolaris 10Vega 10Vega 10Vega 102x Vega 10
Process Node28nm14nm FinFETFinFETFinFETFinFETFinFET
Stream Processors40962304409635844096 (?)Up to 8192
Performance8.6 TFLOPS
8.6 (FP16) TFLOPS
5.8 (FP16) TFLOPS
~25 (FP16) TFLOPS
22 (FP16) TFLOPS
>25 (FP16) TFLOPS
Memory Bus4096-bit256-bit2048-bit2048-bit2048-bit4096-bit
Launch20152016June 2017June 2017July 2017TBA

AMD Vega 10 Memory Specifications

The first generation HBM graphics cards such as the Radeon R9 Fury X was limited to just 4 GB of VRAM and had a bandwidth of 512 GB/s. It had 4-layers per stack (256 MB per layer) and that will continue with the latest Vega GPUs since AMD will have to maximize value on these cards for the gaming audience. In the case of 4 layers, we will be looking at higher densities per layer. The pin speed also increases with HBM2. The new memory standard can clock up to 2 Gb/s compared to 1 Gb/s on HBM1.

The increased clock speed would allow the same memory bandwidth as four HBM1 stacks on just two HBM2 stacks. The increased density also allows AMD to cut down the costs in designing larger interposers. HBM2 itself takes more space compared to HBM1 with a die size around 92mm2 while HBM1 was just 35mm2 in size.

AMD Radeon Vega - Faster Than GTX 1080, Very Competitive Pricing

Some things aside from specifications will include graphics performance faster than the GeForce GTX 1080 as demonstrated several times. We can expect the card to feature great power efficiency with higher clock speed on the consumer variant. A 4096 stream processor SKU with 16 GB HBM2 is rated at 225W so we can expect a higher clocked variant for consumers. AMD will have AIBs offering several custom variants of the card including Mini-ITX variants.

AMD also announced during the event that they have partnered up with the behemoth game publisher and developer, Bethesda. Bethesda will be exclusively using AMD as their hardware partner to optimize their games specifically for Radeon graphics cards in Vulkan API. Expect to see Radeon RX Vega GPUs in action in the upcoming quarter, featuring enthusiast and high-performance class graphics cards.