⋮    ⋮  

AMD Launching Full Range Of New Vega Radeon GPUs “Soon” – To Feature Both HBM2 & GDDR5/X Memory

Khalid Moammer
Posted Dec 22, 2016
242Shares
Share Tweet Submit

AMD is reportedly preparing an entirely new top-to-bottom lineup of Radeon graphics cards based on its next generation Vega architecture. The new family, which I’ll refer to as the Radeon 500 series from here on out for the sake of simplicity, will feature second generation high bandwidth memory as well as GDDR5/X memory.

A New Top-To-Bottom Range Of Radeon Graphics Cards Based On The Vega Architecture

According to Fudzilla, AMD will be rolling out its next generation Vega architecture across the entire range of its 2017 Radeon graphics cards and it’ll do it “soon”. The new lineup will span a top-end 4K 60FPS triple A gaming Radeon graphics card, the very same one that was demoed last week, to mid-range and entry level offerings for 1440p and 1080p gaming. The highest end models will feature HBM2 whilst the mid-range and more budget oriented cards will feature GDDR5/X memory.

HBM2 At The High-End, GDDR5/X In The Mid-Range And Entry-Level

AMD’s Robert Hallock confirmed to Wccftech.com earlier this year that the GCN graphics architecture is compatible with both HBM and GDDR5 memory standards. Which is why this doesn’t come as particularly surprising to us. Especially considering the complexity associated with stacking the high bandwidth memory dies as well as the additional cost of requiring an interposer to connect the memory to the GPU die.

Robert Hallock, Technical Marketing lead at AMD

“AMD helped lead the development of HBM, was the first to bring HBM to market in GPUs, and plans to implement HBM/HBM2 in future graphics solutions.

At this time we have only publicly demonstrated a GDDR5 configuration of the Polaris architecture.It’s important to understand that HBM isn’t (currently) suitable for all GPU segments due to the current HBM cost structure. In the mainstream GPU segment, GDDR5 remains an extremely cost-effective, efficient and viable memory technology.

We have the flexibility to use HBM or GDDR5 as costs require. Certain market segments are cost sensitive, GDDR5 can be used there. Higher-end market segments where more cost can be afforded, HBM is viable as well.”

Fudzilla further reports that there’s no confirmation regarding whether AMD will be using standard GDDR5 memory or the faster GDDR5X for its mid-range and entry-level products.

Nvidia GTX 1080 Ti Launching In January With Titan X Performance At A Much Lower Price

We’ve already seen one upcoming Radeon graphics card based on Vega in action. The yet unreleased graphics card was demoed in a head-to-head comparison with Nvidia’s GTX 1080. The demo Vega graphics card had 8GB of HBM2 and it outperformed the 1080  by 10% whilst running Doom in Vulkan at 4K.

The Vega Architecture – AMD’s Clever Next Generation Compute Unit

One big announcement that AMD made in its recent press event where Vega was demoed is that the new architecture features what the company calls its NCU, short for Next Compute Unit. We had already detailed key parts of this new design in our exclusive piece about Vega 10 and Vega 11 a couple of months ago.

This new architecture holds several key advantages over its predecessor. Chief among which is that each SIMD inside a given Vega NCU is now capable of simultaneously processing variable length wavefronts. Which to the average person sounds like a bunch of meaningless technical jargon, I know it did to me when I first learned about it. However, once you scratch the surface and truly understand what this means you quickly begin to realize how much of a big deal this really is.

In AMD’s current GCN implementation, each compute unit has four 16-wide vector SIMD units, capable of executing four 16-wide wavefronts ( a group of threads ) over four cycles. In addition to one scalar unit, capable of executing one instruction per cycle. This unit is delegated time-critical tasks, where the four-cycle turnaround of the SIMD unit is simply not good enough.

Unfortunately, these 16-wide SIMD units work exactly the same no matter how small of a wavefront they’re fed. The SIMD unit has to spend four cycles executing whatever threads are presented to it, no matter what. Which means that executing a 16-wide wavefront would take just as long as executing a 4-wide wavefront as an example, rendering the other 12 ALUs inside the SIMD completely useless. Graphics workloads are inherently non-uniform, which means that it’s effectively impossible to find any scenario where all 16-wide SIMD units would be fully occupied at any given time.

AMD Vega Features Leaked - 4x Efficiency, 2x Performance/Clock , 8x Capacity Per HBM Stack & Next Gen Compute Engine

Variable Width Wavefront SIMDs, Getting More Performance Out Of Fewer Cycles

This is no longer the case in AMD’s new GCN implementation inside Vega. The V9 architecture includes new clever schedulers and coherency subsystems that allow several wavefronts, of different widths, to be executed simultaneously inside any compute unit that’s able to accommodate the workload. So that more ALUs would be doing useful work at any given time instead of idling or executing predicted off threads that produce no results

AMD Vega architecture
This in effect allows each NCU to finish considerably more work in the same amount of time compared to a traditional CU. In addition to freeing up valuable cache and memory resources for other compute units. It’s very hard to predict how much of a difference this big of an improvement in resource utilization and CU occupancy will yield given how unpredictable and inherently fluctuant graphics workloads are.  Vega’s Next Compute Units are therefor not only faster but also more power efficient. Although by how much exactly remains to be seen.

AMD Vega 10 & Vega 11 GPUs

Graphics CardR9 Fury XRX 480TBATBA
GPUFiji XTPolaris 10Vega 11Vega 10
Process Node28nm14nm14nm14nm
Performance8.6 TFLOPS
8.6 (FP16) TFLOPS
5.8 TFLOPS
5.8 (FP16) TFLOPS
TBA12.5 TFLOLPS
25 (FP16) TFLOPS
Memory4GB HBM8GB GDDR5TBA16GB/8GB HBM2
Memory Bus4096-bit256-bitTBA2048-bit
Bandwidth512 GB/s256 GB/sTBA512 GB/s
TDP275W150WTBA<300W
Launch2015201620172017
Stream Processors40962304TBD4096
Share Tweet Submit