Hardware Leak PC

AMD Vega Features Leaked – 4x Efficiency, 2x Performance/Clock , 8x Capacity Per HBM Stack & Next Gen Compute Engine

Khalid Moammer • Jan 2, 2017 at 11:11am EST

The features of AMD's upcoming Radeon RX 500 Series Vega architecture have been discovered in the code of the just launched ve.ga teaser site and they're incredibly impressive. The company's upcoming next generation Vega graphics architecture is due for a major preview at CES on Thursday, less than three days away.

However, thanks to our crafty friends over at 3DCenter who have managed to dig up some major yet unreleased details regarding the brand new architecture you don't have to wait one more minute. All the details have been pulled from within the code-base of the Vega teaser website, ve.ga. Which not only makes this the biggest Vega leak yet, it also makes it the most significant because of its impeccable authenticity and accuracy. So without any further delay, let's get to the juicy bits!

Vega, AMD's Most Advanced & Most Impressive Graphics Architecture To Date

Let's start off with a simple summary of Vega's key features. This should help paint a picture of how much of a drastic step forward the new architecture is compared to Polaris.

Vega Architecture

- 4x Power Efficiency
- 2x Peak Throughput/Performance Per Clock
- High Bandwidth Cache
- 2x Bandwidth per pin
- 8x Capacity Per stack ( 2nd Generation High Bandwidth Memory )
- 512TB Virtual Address Space
- Next Generation Compute Engine
- Next Generation Pixel Engine
- Next Compute Unit Architecture
- Rapid Packed Math
- Draw Stream Binning Rasterizer
- Primitive Shaders

AMD Vega Lineup

Graphics Card	Radeon R9 Fury X	Radeon RX 480	Radeon RX Vega Frontier Edition	Radeon Vega Pro	Radeon RX Vega (Gaming)	Radeon RX Vega Pro Duo
GPU	Fiji XT	Polaris 10	Vega 10	Vega 10	Vega 10	2x Vega 10
Process Node	28nm	14nm FinFET	FinFET	FinFET	FinFET	FinFET
Stream Processors	4096	2304	4096	3584	4096 (?)	Up to 8192
Performance	8.6 TFLOPS 8.6 (FP16) TFLOPS	5.8 TFLOPS 5.8 (FP16) TFLOPS	~13 TFLOLPS ~25 (FP16) TFLOPS	11 TFLOLPS 22 (FP16) TFLOPS	>13 TFLOLPS >25 (FP16) TFLOPS	TBA TBA
Memory	4GB HBM	8GB GDDR5	16GB HBM2	TBA	TBA	TBA
Memory Bus	4096-bit	256-bit	2048-bit	2048-bit	2048-bit	4096-bit
Bandwidth	512GB/s	256GB/S	480GB/S	400GB/S	TBA	TBA
TDP	275W	150W	TBA	TBA	TBA	TBA
Launch	2015	2016	June 2017	June 2017	July 2017	TBA

Vega's Next Compute Unit (NCU), 2x Peak Throughput per Clock And 4x The Power Efficiency

According to the newly dug up data Vega delivers four times the graphics performance at the same power compared to AMD's previous generation. There isn't much detail to expand upon in terms of the context here. However, it's very clear that AMD is referring to half precision compute. Which would mean that Vega delivers double the single precision compute at the same power.

This is the most impressive figure of the bunch. Doubling the power efficiency of a graphics architecture whilst maintaining or boosting performance is an incredibly challenging engineering feat. One that's made even harder in the case of Vega considering that it is built on the same 14nm manufacturing process as Polaris. If it stands true then AMD engineers will have pulled nothing short of a miracle.

2x peak throughput/clock is another impressive figure that stands as a testament to how radically different Vega is compared to AMD's previous generation GCN architecture. It means that Vega should deliver double the performance at any given clock speed compared to AMD's previous generation GCN based GPUs.

High Bandwidth Cache, 8x Capacity Per Stack, 2x Bandwidth Per Pin And 512TB Address Space

These specs and features are specific to Vega's second generation High Bandwidth Memory technology. HBM2 offers 8x the capacity per stack compared to first generation HBM and twice the bandwidth per stack thanks to a higher clock speed. First generation HBM found in AMD's Fury series of enthusiast graphics cards features a maximum of 1GB capacity per stack and 128GB/s of bandwidth per stack.

Second generation HBM comes in stacks of up to 8GB and 256GB/s of bandwidth. Interestingly, the Vega engineering sample that AMD demoed last month was actually an 8GB model with 512GB/s of bandwidth. Which would indicate that it was equipped with two 4GB HBM2 stacks, each delivering 256GB/s of bandwidth, rather than a single 8GB stack. However, the Radeon Instinct MI25 deep-learning accelerator based on the same Vega GPU features 16GB of memory and 512GB/s of bandwidth. Which means that AMD had to equip it with two 8GB stacks.

Each HBM stack connects to the GPU via a 1024bit memory controller. HBM2 comes out of the factory clocked at double the frequency of first generation HBM. Which is how it delivers double the bandwidth per pin. The 512TB virtual address space feature is quite an interesting one and is likely achieved by quickly swapping data in and out of the HBM cache.

Below you will find a quick recap of what we know about AMD's Vega architecture & the upcoming RX 500 series graphics cards.

A New Top-To-Bottom Range Of Radeon RX 500 Series Graphics Cards Based On The Vega Architecture

AMD will be rolling out its next generation Vega architecture across the entire range of its 2017 Radeon graphics cards and it'll do it "soon". The new lineup will span a top-end 4K 60FPS triple A gaming Radeon graphics card, the very same one that was demoed last week, to mid-range and entry level offerings for 1440p and 1080p gaming. The highest end models will feature HBM2 whilst the mid-range and more budget oriented cards will feature GDDR5/X memory.

We've already seen one upcoming Radeon graphics card based on Vega in action. The yet unreleased graphics card was demoed in a head-to-head comparison with NVIDIA's GTX 1080. The demo Vega graphics card had 8GB of HBM2 and it outperformed the 1080 by 10% whilst running Doom in Vulkan at 4K.

The Vega Architecture - AMD's Next Generation Compute Unit

One big announcement that AMD made in its recent press event where Vega was demoed is that the new architecture features what the company calls its NCU, short for Next Compute Unit. We had already detailed key parts of this new design in our exclusive piece about Vega 10 and Vega 11 a couple of months ago.

This new architecture holds several key advantages over its predecessor. Chief among which is that each SIMD inside a given Vega NCU is now capable of simultaneously processing variable length wavefronts. Which to the average person sounds like a bunch of meaningless technical jargon, I know it did to me when I first learned about it. However, once you scratch the surface and truly understand what this means you quickly begin to realize how much of a big deal this really is.

In AMD's current GCN implementation, each compute unit has four 16-wide vector SIMD units, capable of executing four 16-wide wavefronts (a group of threads) over four cycles. In addition to one scalar unit, capable of executing one instruction per cycle. This unit is delegated time-critical tasks, where the four-cycle turnaround of the SIMD unit is simply not good enough.

Unfortunately, these 16-wide SIMD units work exactly the same no matter how small of a wavefront they're fed. The SIMD unit has to spend four cycles executing whatever threads are presented to it, no matter what. Which means that executing a 16-wide wavefront would take just as long as executing a 4-wide wavefront as an example, rendering the other 12 ALUs inside the SIMD completely useless. Graphics workloads are inherently non-uniform, which means that it's effectively impossible to find any scenario where all 16-wide SIMD units would be fully occupied at any given time.

Variable Width Wavefront SIMDs, Getting More Performance Out Of Fewer Cycles

This is no longer the case in AMD's new GCN implementation inside Vega. The V9 architecture includes new clever schedulers and coherency subsystems that allow several wavefronts, of different widths, to be executed simultaneously inside any compute unit that's able to accommodate the workload. So that more ALUs would be doing useful work at any given time instead of idling or executing predicted off threads that produce no results.

AMD Vega architecture
This in effect allows each NCU to finish considerably more work in the same amount of time compared to a traditional CU. In addition to freeing up valuable cache and memory resources for other compute units. It's very hard to predict how much of a difference this big of an improvement in resource utilization and CU occupancy will yield given how unpredictable and inherently fluctuant graphics workloads are. Vega's Next Compute Units are therefore not only faster but also more power efficient. Although by how much exactly remains to be seen.

About the author: PC hardware & tech evangelist. Been building PCs for over a decade & following the industry for just as long. Also a doctor specializing in Preventive Medicine.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on AMD Vega Features Leaked – 4x Efficiency, 2x Performance/Clock , 8x Capacity Per HBM Stack & Next Gen Compute Engine

AMD Vega Features Leaked – 4x Efficiency, 2x Performance/Clock , 8x Capacity Per HBM Stack & Next Gen Compute Engine

Vega, AMD's Most Advanced & Most Impressive Graphics Architecture To Date

AMD Vega Lineup

Vega's Next Compute Unit (NCU), 2x Peak Throughput per Clock And 4x The Power Efficiency

High Bandwidth Cache, 8x Capacity Per Stack, 2x Bandwidth Per Pin And 512TB Address Space

A New Top-To-Bottom Range Of Radeon RX 500 Series Graphics Cards Based On The Vega Architecture

The Vega Architecture - AMD's Next Generation Compute Unit

Variable Width Wavefront SIMDs, Getting More Performance Out Of Fewer Cycles

Trending Stories

PlayStation 6 Controller Could Ditch the Part That Wears Out, After Years of DualSense Stick Drift Complaints

DeepSeek CEO Believes NVIDIA Is Now “Digging Its Own Grave” Even As 1 NVIDIA GB300 GPU Equals 4 Huawei Acend 950 GPUs

AMD Zen 7 “2028” and Zen 8 “2030” CPU Architectures Confirmed – EPYC Florence “Zen 7” To Feature Next-Gen Node, DDR6 Memory & ACE Extensions

AMD EPYC Venice CPUs Stomp NVIDIA’s Vera With 20% Faster Single-Core & 2.2x Higher Throughput With Up to 256 “Zen 6” Cores, 203 Billion Transistors & Over 5 GHz+ Clocks

Intel 14A Enters High-Volume Production In 2028, As Risk Production Moved Ahead To 2027 On Internal Products

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

Watch The AMD “Advancing AI 2026” Event Live Here – Next-Gen Zen 6 EPYC CPUs, Instinct MI400 Series & Helios AI Rack Launch

AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

NVIDIA DLSS 5 Hands Over Full Control To Artists To “Direct The Final Frame”, As SIGGRAPH Technical Demo Shows How Neural Rendering Solved Big Challenge To Achieve 4K “Life-Like” Visuals On A Single GPU

NVIDIA Vera CPU Is Architected For The Agentic AI Era, as It Delivers Max Single-Core & Single-Thread Performance Versus x86; Full Architectural Breakdown Shows

AMD Vega Features Leaked – 4x Efficiency, 2x Performance/Clock , 8x Capacity Per HBM Stack & Next Gen Compute Engine

Related Story AMD Wins Best Brand, Ryzen Wins Best PC Component, NVIDIA GTX 1080 Ti Wins Best GPU – TR Awards 2017

Vega, AMD's Most Advanced & Most Impressive Graphics Architecture To Date

AMD Vega Lineup

Vega's Next Compute Unit (NCU), 2x Peak Throughput per Clock And 4x The Power Efficiency

High Bandwidth Cache, 8x Capacity Per Stack, 2x Bandwidth Per Pin And 512TB Address Space

A New Top-To-Bottom Range Of Radeon RX 500 Series Graphics Cards Based On The Vega Architecture

The Vega Architecture - AMD's Next Generation Compute Unit

Variable Width Wavefront SIMDs, Getting More Performance Out Of Fewer Cycles

Further Reading

Trending Stories

Popular Discussions