AMD Unveils Radeon Pro Vega 64, Vega 56 & Vega Die Shot – 25-22 TFLOPS, 400GB/s, 16-8 GB HBM2 & 4096-3584 SPs

Author Photo
Jun 5

AMD has officially introduced today the Radeon Pro Vega, a powerful next generation professional graphics card that will power Apple’s new iMac pro system.

AMD Vega 10 Die Shot Detailed

Before we get back to AMD’s Radeon Pro Vega, let’s first discuss what sits at the heart of every Radeon Pro Vega graphics cards and that’s the Vega 10 GPU. The very same GPU that will power AMD’s upcoming Radeon Vega Frontier Edition and Radeon RX Vega graphics cards.

d2_story_heroic_01_1495096289RelatedE3 2017: Destiny 2 PC Hands-On With The Best Looking Version To Date

The Vega 10 GPU is significantly larger than the Polaris 10/20 chips that the RX 480 and RX 580 are based on. It features 256 texture mapping units and 64 next generation Vega compute units arranged in two islets, each housing two compute engines. Every compute engine includes two distinct compute clusters. Each of those clusters features 512 stream processors and 32 texture mapping units. The chip in its entirety has a total of 4096 stream processors and 256 texture mapping units.

On the front-end side of things there are 64 render output units that make up 16 distinct render back-ends that connect to the 2048-bit wide-IO HBM2 memory interface. The whole Vega 10 die sits on an interposer and is stacked in a 2.5D fashion with two HBM2 stacks. Every stack can be configured with up to 8 GB of memory for a total of 16 gigabytes of memory for both stacks.

For a more detailed look at the Vega 10 GPU specs make sure to check out our in-depth Vega 10 spec break down here.

xbox-one-x-2RelatedPenello: I Love and Don’t Want To Break Console Gaming; I Don’t Want A New Console Every Year

AMD Vega 10 GPU Specifications

GPUPolaris 10 XTVega 10 XT
Process Node14nm14nm
Shader Engines44
Stream Processors23044096
Performance5.8 TFLOPS
5.8 (FP16) TFLOPS
25 (FP16) TFLOPS
Render Output Units3264
Texture Mapping Units144256
Hardware Threads48
Memory Interface256-bit2048-bit
Memory8GB GDDR5Up To 16GB HBM2

AMD Radeon Pro Vega 64 & Radeon Pro Vega 56

The Radeon Pro Vega graphics card features AMD’s latest and greatest Vega architecture and Vega 10 GPU. The card comes in two flavors, a full-fat version and a skimmed version. The latter is what will come standard with all iMac Pros, while the Vega 64 will be an option users can upgrade to. The Radeon Pro Vega 64 will feature the cream of the crop “Vega 10 XT” GPU configuration with all of its 64 compute units, hence the name, and 16GB of HBM2.

The Radeon Pro Vega 56 is based on a cut-back “Vega 10 Pro” GPU with only 56 compute units instead of the full 64. This leaves this variant with 3584 GCN stream processors, 512 short of the Radeon Vega Frontier Edition that AMD is launching on the 27th of this month and its bigger brother the Vega Pro 64. Even this “skimmed” version delivers a whopping 22 TFLOPS of graphics horsepower, 400GB/S of memory bandwidth and will come with 8 gigabytes of 2nd generation vertically stacked High Bandwidth Memory.

AMD Radeon Vega Lineup

Graphics CardRadeon R9 Fury XRadeon RX 480Radeon RX Vega Frontier EditionRadeon RX Vega (Gaming)Radeon Pro Vega 64Radeon Pro Vega 56
GPUFiji XTPolaris 10Vega 10Vega 10Vega 10Vega 10
Process Node28nm14nm FinFETFinFETFinFETFinFETFinFET
Compute Units643664TBA6456
Stream Processors409623044096TBA40963584
Performance8.6 TFLOPS
8.6 (FP16) TFLOPS
5.8 (FP16) TFLOPS
25 (FP16) TFLOPS
~25 (FP16) TFLOPS
22 (FP16) TFLOPS
Texture Mapping Units256144256256256224
Render Output Units643264646464
Memory Bus4096-bit256-bit2048-bit2048-bit2048-bit2048-bit
Launch20152016June 2017July 2017December 2017December 2017

The Vega Architecture

High Bandwidth Cache And Unique Memory Sub-System

With the Vega architecture AMD is introducing several new cutting edge technologies, chief among which is a brand new unique memory engine. In Vega 10 the HBM2 storage acts as a superfast cache thanks to a specialized processor that AMD dubs the High Bandwidth Cache Controller. The HBCC works to seamlessly stream data in and out of the memory, allowing Vega GPUs to have an insanely large address space of up to 512TB. This address space is only limited by the system’s overall storage space.

Vega Next Generation Compute Engine

The next generation compute unit the company is debuting with Vega can execute half precision 16-bit floating point ops at twice the rate of FP32, which software can opportunistically take advantage of to increase throughput and reduce the thermal and power footprints of the GPU.

Next-Gen Compute Units (NCUs) provide super-charged pathways for doubling processing throughput when using 16-bit data types.1 In cases where a full 32 bits of precision is not necessary to obtain the desired result, they can pack twice as much data into each register and use it to execute two parallel operations. This is ideal for a wide range of computationally intensive applications including image/video processing, ray tracing, artificial intelligence, and game rendering.

Geometry Engine

Vega also features a new programmable geometry engine that delivers twice the performance per clock. In combination with the engine’s new primitive shader discard capability Vega is now significantly faster at tessellation and rendering of complex geometry and detail rich scenes.

The most challenging workloads for a GPU can present it with millions of geometry primitives per frame, all of which must be evaluated to determine their contribution to the final image.  New primitive shader technology allows Radeon Pro Vega graphics to perform geometry culling at an accelerated rate, eliminating unnecessary work for the rest of the GPU.  An advanced workload distribution mechanism then assigns processing tasks to the available pipelines in a way that maximizes their utilization and avoids idle time.  The result is Radeon Pro Vega graphics is capable of rendering extremely complex 3D models and scenes smoothly in real time.

The gemoetry pipeline also includes a new Primitive Discard Accelerator that detects parts of the gemoetry that are obscured by other objects or sit outside the scene and discards them, saving power and performance. The PDA ensure only visible parts of the scene are rendered and no energy is wasted on rendering invisible geometry. The issue of wasting cycles on rendering the invisible has led to unnecessarily slow performance in numerous games including Crysis 2, where it would make GPUs wastefully tessellate entire oceans of invisible water hidden below the surface.

Pixel Engine

Another key part of the Vega architecture is AMD’s brand new pixel engine which is able to break work down into batches that then can enter the cache directly rather than reside in memory. This saves power, cycles, increases overall bandwidth and renders the scene faster.

Another clever technology that will be debuting with the Vega architecture is the shade-once technology which works just like the Primitive Discard Accelerator but on the pixel scale. It analyses pixels early in the graphics pipe and discards any that are hidden behind other objects in the scene. Again saving power, cycles and rendering the scene faster.

Another key advantage with the new Pixel engine is the fact that AMD has now linked it directly to the on-chip cache rather than the off-chip memory. This approach allows for some key optimization opportunities that developers are already familiar with on the gaming consoles.


Vega Architecture Key Features

– 4x Power Efficiency
– 2x Peak Throughput/Performance Per Clock
– High Bandwidth Cache
– 2x Bandwidth per pin
– 8x Capacity Per stack (2nd Generation High Bandwidth Memory)
– 512TB Virtual Address Space
– Next Generation Compute Engine
– Next Generation Pixel Engine
– Next Generation Compute Unit optimized for higher clock speeds
– Rapid Packed Math
– Draw Stream Binning Rasterizer
– Primitive Shaders

You can read about the Vega architecture in full detail here.