⋮    ⋮  

AMD Launching Full Range Of New Vega Radeon GPUs “Soon” – To Feature Both HBM2 & GDDR5/X Memory

Author Photo
Dec 22, 2016

AMD is reportedly preparing an entirely new top-to-bottom lineup of Radeon graphics cards based on its next generation Vega architecture. The new family, which I’ll refer to as the Radeon 500 series from here on out for the sake of simplicity, will feature second generation high bandwidth memory as well as GDDR5/X memory.

A New Top-To-Bottom Range Of Radeon Graphics Cards Based On The Vega Architecture

According to Fudzilla, AMD will be rolling out its next generation Vega architecture across the entire range of its 2017 Radeon graphics cards and it’ll do it “soon”. The new lineup will span a top-end 4K 60FPS triple A gaming Radeon graphics card, the very same one that was demoed last week, to mid-range and entry level offerings for 1440p and 1080p gaming. The highest end models will feature HBM2 whilst the mid-range and more budget oriented cards will feature GDDR5/X memory.

amd-radeon-vega-feature-wccftechRelatedAMD Launching RX Vega 32, 28 & A Dozen New Vega 11 Cards, GPU Passes Certification

HBM2 At The High-End, GDDR5/X In The Mid-Range And Entry-Level

AMD’s Robert Hallock confirmed to Wccftech.com earlier this year that the GCN graphics architecture is compatible with both HBM and GDDR5 memory standards. Which is why this doesn’t come as particularly surprising to us. Especially considering the complexity associated with stacking the high bandwidth memory dies as well as the additional cost of requiring an interposer to connect the memory to the GPU die.

Robert Hallock, Technical Marketing lead at AMD

“AMD helped lead the development of HBM, was the first to bring HBM to market in GPUs, and plans to implement HBM/HBM2 in future graphics solutions.

At this time we have only publicly demonstrated a GDDR5 configuration of the Polaris architecture.It’s important to understand that HBM isn’t (currently) suitable for all GPU segments due to the current HBM cost structure. In the mainstream GPU segment, GDDR5 remains an extremely cost-effective, efficient and viable memory technology.

amd-radeon-rx-vega-featureRelatedAMD Vega 11 GPUs Entering Production, Vega 20 Coming On 7nm

We have the flexibility to use HBM or GDDR5 as costs require. Certain market segments are cost sensitive, GDDR5 can be used there. Higher-end market segments where more cost can be afforded, HBM is viable as well.”

Fudzilla further reports that there’s no confirmation regarding whether AMD will be using standard GDDR5 memory or the faster GDDR5X for its mid-range and entry-level products.

We’ve already seen one upcoming Radeon graphics card based on Vega in action. The yet unreleased graphics card was demoed in a head-to-head comparison with Nvidia’s GTX 1080. The demo Vega graphics card had 8GB of HBM2 and it outperformed the 1080  by 10% whilst running Doom in Vulkan at 4K.

The Vega Architecture – AMD’s Clever Next Generation Compute Unit

One big announcement that AMD made in its recent press event where Vega was demoed is that the new architecture features what the company calls its NCU, short for Next Compute Unit. We had already detailed key parts of this new design in our exclusive piece about Vega 10 and Vega 11 a couple of months ago.

This new architecture holds several key advantages over its predecessor. Chief among which is that each SIMD inside a given Vega NCU is now capable of simultaneously processing variable length wavefronts. Which to the average person sounds like a bunch of meaningless technical jargon, I know it did to me when I first learned about it. However, once you scratch the surface and truly understand what this means you quickly begin to realize how much of a big deal this really is.

In AMD’s current GCN implementation, each compute unit has four 16-wide vector SIMD units, capable of executing four 16-wide wavefronts ( a group of threads ) over four cycles. In addition to one scalar unit, capable of executing one instruction per cycle. This unit is delegated time-critical tasks, where the four-cycle turnaround of the SIMD unit is simply not good enough.

Unfortunately, these 16-wide SIMD units work exactly the same no matter how small of a wavefront they’re fed. The SIMD unit has to spend four cycles executing whatever threads are presented to it, no matter what. Which means that executing a 16-wide wavefront would take just as long as executing a 4-wide wavefront as an example, rendering the other 12 ALUs inside the SIMD completely useless. Graphics workloads are inherently non-uniform, which means that it’s effectively impossible to find any scenario where all 16-wide SIMD units would be fully occupied at any given time.

Variable Width Wavefront SIMDs, Getting More Performance Out Of Fewer Cycles

This is no longer the case in AMD’s new GCN implementation inside Vega. The V9 architecture includes new clever schedulers and coherency subsystems that allow several wavefronts, of different widths, to be executed simultaneously inside any compute unit that’s able to accommodate the workload. So that more ALUs would be doing useful work at any given time instead of idling or executing predicted off threads that produce no results

AMD Vega architecture
This in effect allows each NCU to finish considerably more work in the same amount of time compared to a traditional CU. In addition to freeing up valuable cache and memory resources for other compute units. It’s very hard to predict how much of a difference this big of an improvement in resource utilization and CU occupancy will yield given how unpredictable and inherently fluctuant graphics workloads are.  Vega’s Next Compute Units are therefor not only faster but also more power efficient. Although by how much exactly remains to be seen.

AMD Vega Lineup

Graphics CardRadeon R9 Fury XRadeon RX 480Radeon RX Vega Frontier EditionRadeon Vega ProRadeon RX Vega (Gaming)Radeon RX Vega Pro Duo
GPUFiji XTPolaris 10Vega 10Vega 10Vega 102x Vega 10
Process Node28nm14nm FinFETFinFETFinFETFinFETFinFET
Stream Processors40962304409635844096 (?)Up to 8192
Performance8.6 TFLOPS
8.6 (FP16) TFLOPS
5.8 (FP16) TFLOPS
~25 (FP16) TFLOPS
22 (FP16) TFLOPS
>25 (FP16) TFLOPS
Memory Bus4096-bit256-bit2048-bit2048-bit2048-bit4096-bit
Launch20152016June 2017June 2017July 2017TBA