Full AMD RX 480 Slide Deck Leaked, Polaris 10 GCN 4.0 GPU Architecture Detailed – Enhanced Async, Improved Shaders and 2.8x GPU Efficiency

Author Photo
Jun 28, 2016

The full slide deck of AMD’s next generation Polaris GPU based Radeon RX 480 graphics card has been leaked ahead of launch. Several slides including architectural block diagrams have been unveiled by Videocardz in an exclusive post which we will be dissecting down. The Radeon RX 480 itself will be launching today in just a few hours so expect to see reviews go live in a couple of hours.

The AMD Radeon RX 480 launches in less than 24 hours for a starting price of $199 US.

dsc03903RelatedThe Elusive AMD Wraith Max CPU Cooler Tested With Ryzen 7 1700

AMD Radeon RX 480 Slide Deck Unveils Polaris 10 Architecture Details – Async Compute, Improved Shaders, Enhanced Geometry Inside RTG’s Most Efficient GPU To Date

The entire slide deck is full of juicy details regarding the Polaris 10 GPU architecture which will power the RX 480 and RX 470 graphics cards. While we have already given a technical overlook of these graphics cards, we will now be discussing the Polaris 10 GPU that will be powering the new RX series cards. For performance leaks, you can look at the articles below:

AMD Polaris 10 GPU Block Diagram Detailed – 2304 Shaders And A Lot More

The Polaris 10 GPU is based on the 14nm FinFET process from Global Foundries. The chip features a single Graphics Command Processor which is allocated 4 ACE (Asynchronous Compute Engines) and 2 Hardware Schedulers (HWS). The complete Polaris 10 (XT) chip will feature 36 compute units where each compute features a total of 64 stream processors. This rounds up to 2304 stream processors on the flagship Polaris graphics processor.

AMD Radeon RX 480 Polaris 10 GPU block diagram has been leaked. (Image Credits: Videocardz)

gigabyte-x370RelatedBEWARE: Latest Gigabyte X370 Motherboard BIOS Can Permanently Damage AMD Ryzen CPUs, Sets Voltage Up To 1.7V

Each array of 9 Compute Units is housed within a shader engine comprising of a single Geometry Engine (4 in total). There are a total of 144 texture units which means that each CU comprises of 4 TMUs. The Polaris 10 GPU also comprises of 32 ROPs and a larger 2 MB L2 cache. There are eight 32-bit memory controllers on the full chip which round up to a 256-bit wide bus interface. Aside from the Graphics block, the Polaris 10 chip is also housing the CrossFire, Display Port 1.4-HDR, HDMI 2.0b, video encode/decode acceleration , DMA engine and several other multimedia accelerators.

AMD Radeon RX 480 – Polaris 10 With Enhanced Geometry Engines

AMD has deployed a new technology known as Primitive Discard Accelerator which improved the geometry performance through culling triangles early in the pipeline with zero area. The GPU also deploys a new index cache for geometry with small instances. This helps reduce data movement through the geometry engine so that the GPU can free internal bandwidth resources and maintain higher primitive throughput while geometry is being processed.

The result is a massive increase in geometry performance, especially Tessellation with high amounts of anti-aliasing. With Primitive Discard enabled in Radeon GPUs, the Polaris 10 GPU can get up to 3.5x increase in geometry processing such as tessellation.

AMD Radeon RX 480 – Polaris 10 With Improved Shader Efficiency, FP16 Support

Polaris also features improved shader efficiency providing better performance per shader core. This is achieved through using instruction prefetch which improves efficiency by reducing pipeline stalls and making GPU cache process more efficient, increase per wave instruction buffer size that allows higher single threaded performance along with tuned L2 cache.

The Polaris GPUs are also capable of native FP16 and Int16 support. This allows FP (Floating Point) performance at half the rate of single precision which is better tuned for graphics, computer vision and data learning markets. The use of FP16 results in lower power compared to FP32 compute and also reduced memory/register usage.

AMD Radeon RX 480 – Polaris 10 With Advanced Asynchronous Compute Capabilities

Asynchronous Compute is one of AMD’s strongest feature with GCN based graphics cards and it is back with a lot more advanced features on GCN 4.0. AMD’s Async Compute is flexible in the way that it can make graphics and compute work asynchronously.

Polaris can also pre-empt and offer quicker response queue to graphics and compute tasks as needed. The graphics chip selects from the most ideal scenario and picks up which resources need to be managed in any of the ways possible. The Polaris 10 GPU houses four Hardware Schedules dedicated for Asynchronous compute which will help a lot in DirectX 12 titles supporting the new feature.

AMD Radeon RX 480 – Polaris 10 With Memory / Color Compression

Talking about memory, AMD has updated their Polaris 10 GPU with the latest memory controller that supports the latest 8 GB GDDR5 chips to provide up to 256 GB/s bandwidth along a 256-bit bus interface. This technology has been vastly improved over the generations and the Polaris GPU can compress up to 40% data compared to just 20% on the Fury X (Fiji GPU) and 0% on the R9 290X (Hawaii GPU). Color and Delta compression provide a higher peak bandwidth that adds to efficiency.

AMD Radeon RX 480 Graphics Card Specifications – The Flagship Polaris 10 GPU With 1266 MHz Core Clock

The Radeon RX 480 will be based on the full Polaris 10 GPU which features 2304 unified shaders based on the GCN 4.0 architecture. The Radeon RX 480 comes with 144 Texture Mapping Units and 32 ROPs which deliver a texture fill rate of 182.3 GTexel/s and Pixel Fillrate of 40.5 GPixel/s at a clock speed of 1266.0 MHz. The Radeon RX 480 will be available in reference flavors when it launches tomorrow and will get custom models around a week or two later.

AMD Radeon RX 480 Graphics Card:

The RX 480 graphics card will be available in two variants, a 4 GB model with a price of $199 US and a 8 GB model with a price of $229 US. The GDDR5 memory on both variants will be clocked at 8 GB/s (2.0 GHz clock), effectively delivering a cumulative bandwidth of 256 GB/s along a 256-bit interface. Based on the 14nm Finfet technology, the graphics card features a 150W TDP and powered by a single 6-Pin connector. Display outputs for the RX 480 include three DP 1.3, 1.4-HDR and a single HDMI 2.0b port. This will allow the graphics card to drive resolutions from as low as 1080P (240Hz HDR) to 4K 120Hz SDR / 96Hz HDR or even 5K @60Hz (SDR).

AMD Polaris GCN 4.0 GPU Lineup:

Graphics Card NameAMD Radeon RX 480AMD Radeon RX 470AMD Radeon RX 470DAMD Radeon RX 460 1024 SPsAMD Radeon RX 460
Graphics CorePolaris 10Polaris 10Polaris 10Polaris 11Polaris 11
Process Node14nm FinFET14nm FinFET14nm FinFET14nm FinFET14nm FinFET
Die Size232mm2232mm2232mm2123mm2123mm2
Transistors5.7 Billion5.7 Billion5.7 Billion3.0 Billion3.0 Billion
Stream Processors2304 SPs2048 SPs1792 SPs1024 SPs896 SPs
Clock Frequency1266 MHz1206 MHz1206 MHz1200 MHz1200 MHz
Compute Performance5.8 TFLOPs4.9 TFLOPs4.3 TFLOPs2.56 TFLOPs2.2 TFLOPs
Bus Interface256-bit256-bit256-bit128-bit128-bit
Memory Speed8 GHz6.6 GHz6.6 GHz7 GHz7 GHz
Memory Bandwidth256 GB/s211 GB/s211 GB/s112 GB/s112 GB/s
Launch Date29th June4th August20th OctoberTBD8th August
Launch Price$199 US (4 GB)
$239 US (8 GB)
$179 US (4 GB)$149 US (4 GB)TBD$99 US (2 GB)
$119 US (4 GB)
New Price$199 US (4 GB)
$239 US (8 GB)
$169 US (4 GB)$149 US (4 GB)TBD$99 US (4 GB)
$89 US (2 GB)

AMD’s Polaris 10 “RX 480” Adaptive Clocking and Power Conservation Techniques

AMD Polaris 10 GPU deploys several power saving techniques to allow better efficiency. One such technology is Adaptive clocking which is similar to AMD’s voltage control on their Bristol Ridge APUs. The GPU has certain states for voltage and adaptive clocking recovers waste with 25% power reduction.

AMD is also feature the next chapter to their TrueAudio tech, codenamed TrueAudio Next in Polaris GPUs that will deliver real time GPU Audio physics processing running Ray-Tracing and Convolution. NVIDIA has already feature VRWorks Audio which uses NVIDIA’s PhsyX capabilities to process real time VR Audio and this tech is very similar to their implementation. The TrueAudio Next tech will use Asynchronous Compute to execute the graphics and compute tasks cocurrently.

AMD Radeon Wattman Utility Gets A Detailed Look – Sleek New Design For The Radeon Community

The Radeon WattMan software was detailed yesterday and is a very sleek tool for overclocking the card without the need to install an additional software. The utility will be part of the Radeon drivers allowing you to adjust a range of settings such as GPU voltage/core configuration, memory voltage/core configuration, fan/temperature targets. The Radeon RX 480 graphics card will be the first Polaris 10 graphics card that will be launching today in less than 24 hours.