Next-Gen Compute Units (NCUs) provide super-charged pathways for doubling processing throughput when using 16-bit data types.1 In cases where a full 32 bits of precision is not necessary to obtain the desired result, they can pack twice as much data into each register and use it to execute two parallel operations. This is ideal for a wide range of computationally intensive applications including image/video processing, ray tracing, artificial intelligence, and game rendering.
AMD Unveils Radeon Pro Vega 64, Vega 56 & Vega Die Shot – 25-22 TFLOPS, 400GB/s, 16-8 GB HBM2 & 4096-3584 SPs
AMD has officially introduced today the Radeon Pro Vega, a powerful next generation professional graphics card that will power Apple's new iMac pro system.
AMD Vega 10 Die Shot Detailed
Before we get back to AMD's Radeon Pro Vega, let's first discuss what sits at the heart of every Radeon Pro Vega graphics cards and that's the Vega 10 GPU. The very same GPU that will power AMD's upcoming Radeon Vega Frontier Edition and Radeon RX Vega graphics cards.
The Vega 10 GPU is significantly larger than the Polaris 10/20 chips that the RX 480 and RX 580 are based on. It features 256 texture mapping units and 64 next generation Vega compute units arranged in two islets, each housing two compute engines. Every compute engine includes two distinct compute clusters. Each of those clusters features 512 stream processors and 32 texture mapping units. The chip in its entirety has a total of 4096 stream processors and 256 texture mapping units.
On the front-end side of things there are 64 render output units that make up 16 distinct render back-ends that connect to the 2048-bit wide-IO HBM2 memory interface. The whole Vega 10 die sits on an interposer and is stacked in a 2.5D fashion with two HBM2 stacks. Every stack can be configured with up to 8 GB of memory for a total of 16 gigabytes of memory for both stacks.
For a more detailed look at the Vega 10 GPU specs make sure to check out our in-depth Vega 10 spec break down here.
AMD Vega 10 GPU Specifications
GPU | Polaris 10 XT | Vega 10 XT |
---|---|---|
Process Node | 14nm | 14nm |
Shader Engines | 4 | 4 |
Stream Processors | 2304 | 4096 |
Performance | 5.8 TFLOPS 5.8 (FP16) TFLOPS | 12.5 TFLOLPS 25 (FP16) TFLOPS |
Render Output Units | 32 | 64 |
Texture Mapping Units | 144 | 256 |
Hardware Threads | 4 | 8 |
Memory Interface | 256-bit | 2048-bit |
Memory | 8GB GDDR5 | Up To 16GB HBM2 |
AMD Radeon Pro Vega 64 & Radeon Pro Vega 56
The Radeon Pro Vega graphics card features AMD's latest and greatest Vega architecture and Vega 10 GPU. The card comes in two flavors, a full-fat version and a skimmed version. The latter is what will come standard with all iMac Pros, while the Vega 64 will be an option users can upgrade to. The Radeon Pro Vega 64 will feature the cream of the crop "Vega 10 XT" GPU configuration with all of its 64 compute units, hence the name, and 16GB of HBM2.
The Radeon Pro Vega 56 is based on a cut-back "Vega 10 Pro" GPU with only 56 compute units instead of the full 64. This leaves this variant with 3584 GCN stream processors, 512 short of the Radeon Vega Frontier Edition that AMD is launching on the 27th of this month and its bigger brother the Vega Pro 64. Even this "skimmed" version delivers a whopping 22 TFLOPS of graphics horsepower, 400GB/S of memory bandwidth and will come with 8 gigabytes of 2nd generation vertically stacked High Bandwidth Memory.
AMD Radeon Vega Lineup:
Graphics Card | Radeon R9 Fury X | Radeon RX 480 | Radeon RX Vega Frontier Edition | Radeon RX Vega 64 | Radeon RX Vega 56( | Radeon Pro Vega 64 | Radeon Pro Vega 56 |
---|---|---|---|---|---|---|---|
GPU | Fiji XT | Polaris 10 | Vega 10 | Vega 10 XTX/XT | Vega 10 XL | Vega 10 | Vega 10 |
Process Node | 28nm | 14nm FinFET | FinFET | FinFET | FinFET | FinFET | FinFET |
Compute Units | 64 | 36 | 64 | 64 | 56 | 64 | 56 |
Stream Processors | 4096 | 2304 | 4096 | 4096 | 3584 | 4096 | 3584 |
Performance | 8.6 TFLOPS 8.6 (FP16) TFLOPS | 5.8 TFLOPS 5.8 (FP16) TFLOPS | 13 TFLOLPS 26 (FP16) TFLOPS | Up to 13+ TFLOPS 26+ (FP16) TFLOPS | TBA | ~13 TFLOLPS ~25 (FP16) TFLOPS | 11 TFLOLPS 22 (FP16) TFLOPS |
Texture Mapping Units | 256 | 144 | 256 | 256 | TBA | 256 | 224 |
Render Output Units | 64 | 32 | 64 | 64 | TBA | 64 | 64 |
Memory | 4GB HBM | 8GB GDDR5 | 16GB HBM2 | TBA | TBA | 16GB HBM2 | 8GB HBM2 |
Memory Bus | 4096-bit | 256-bit | 2048-bit | 2048-bit | 2048-bit | 2048-bit | 2048-bit |
Bandwidth | 512GB/s | 256GB/s | 484GB/s | TBA | TBA | TBA | 400GB/s |
TDP | 275W | 150W | 300-375W | TBA | TBA | TBA | TBA |
Launch | 2015 | 2016 | June 2017 | July 2017 | July 2017 | December 2017 | December 2017 |
Price | $649 US | $199 (4 GB) $229 (8 GB) | $999 (Reference) $1499 (Liquid) | $499 (Reference) $549 (Limited Air) $599 (Liquid) $649 (Liquid LE) | $399 | TBD | TBD |
The Vega Architecture
High Bandwidth Cache And Unique Memory Sub-System
With the Vega architecture AMD is introducing several new cutting edge technologies, chief among which is a brand new unique memory engine. In Vega 10 the HBM2 storage acts as a superfast cache thanks to a specialized processor that AMD dubs the High Bandwidth Cache Controller. The HBCC works to seamlessly stream data in and out of the memory, allowing Vega GPUs to have an insanely large address space of up to 512TB. This address space is only limited by the system's overall storage space.
Vega Next Generation Compute Engine
The next generation compute unit the company is debuting with Vega can execute half precision 16-bit floating point ops at twice the rate of FP32, which software can opportunistically take advantage of to increase throughput and reduce the thermal and power footprints of the GPU.
Geometry Engine
Vega also features a new programmable geometry engine that delivers twice the performance per clock. In combination with the engine’s new primitive shader discard capability Vega is now significantly faster at tessellation and rendering of complex geometry and detail rich scenes.
The most challenging workloads for a GPU can present it with millions of geometry primitives per frame, all of which must be evaluated to determine their contribution to the final image. New primitive shader technology allows Radeon Pro Vega graphics to perform geometry culling at an accelerated rate, eliminating unnecessary work for the rest of the GPU. An advanced workload distribution mechanism then assigns processing tasks to the available pipelines in a way that maximizes their utilization and avoids idle time. The result is Radeon Pro Vega graphics is capable of rendering extremely complex 3D models and scenes smoothly in real time.
The gemoetry pipeline also includes a new Primitive Discard Accelerator that detects parts of the gemoetry that are obscured by other objects or sit outside the scene and discards them, saving power and performance. The PDA ensure only visible parts of the scene are rendered and no energy is wasted on rendering invisible geometry. The issue of wasting cycles on rendering the invisible has led to unnecessarily slow performance in numerous games including Crysis 2, where it would make GPUs wastefully tessellate entire oceans of invisible water hidden below the surface.
Pixel Engine
Another key part of the Vega architecture is AMD's brand new pixel engine which is able to break work down into batches that then can enter the cache directly rather than reside in memory. This saves power, cycles, increases overall bandwidth and renders the scene faster.
Another clever technology that will be debuting with the Vega architecture is the shade-once technology which works just like the Primitive Discard Accelerator but on the pixel scale. It analyses pixels early in the graphics pipe and discards any that are hidden behind other objects in the scene. Again saving power, cycles and rendering the scene faster.
Another key advantage with the new Pixel engine is the fact that AMD has now linked it directly to the on-chip cache rather than the off-chip memory. This approach allows for some key optimization opportunities that developers are already familiar with on the gaming consoles.
Vega Architecture Key Features
– 4x Power Efficiency
– 2x Peak Throughput/Performance Per Clock
– High Bandwidth Cache
– 2x Bandwidth per pin
– 8x Capacity Per stack (2nd Generation High Bandwidth Memory)
– 512TB Virtual Address Space
– Next Generation Compute Engine
– Next Generation Pixel Engine
– Next Generation Compute Unit optimized for higher clock speeds
– Rapid Packed Math
– Draw Stream Binning Rasterizer
– Primitive Shaders
You can read about the Vega architecture in full detail here.