AMD Radeon RX 6900XT Graphics Card Review – The Cherry On Top
AMD Radeon RX 6900 XT12/08/2020
AMD Big Navi GPU - RDNA 2 Compute Units, Geometry/GCP Processors, Infinity Cache Deep Dive
To understand AMD's RDNA 2 architecture, we have to take a deep-dive within the architecture itself. For starters, the AMD Big Navi GPU is known as Navi 21 internally and has three SKUs that are going to power the RX 6900 & RX 6800 series. These SKUs include the Navi 21 XTX, Navi 21 XT, and the Navi 21 XL.
The Navi 21 GPU, in general, is based on the 7nm process node from TSMC, measuring at 519.8 mm2 with a total transistor count of 26.8 Billion. That's a transistor density of 51.55 million total transistors per mm2. Within the die are several blocks with the primary block being the Compute Unit. The Compute Units are part of the main Shader Engine and there are four Shader Engines in total on the Navi 21 GPU. Each Shader Engine houses 10 dual compute units which form up to 20 compute units per Shader Engine.
AMD terms the new Compute Unit as an enhanced version of the RDNA 1 version, featuring 30% higher throughput at the same power. Each compute unit packs a total of 64 stream processors, 16 texture mapping units, four texture filter units, and a single Ray Accelerator unit that handles the raytracing capabilities of the GPU. Each CPU also delivers up to 50% power at the same frequency.
Each CU also comes with its own L0 Vector cache that measures 16 KB or 32 KB for the dual CU design with a 32 KB instruction cache and a 16 KB K-Cache. The L0 cache communicates with the L1 cache through a 32B channel and back with a 128B channel. In addition to the L0 cache, there's also a 128 KB L1 cache, a 4 MB L2 cache, and then the new 128 MB Infinity Cache. Each Shader Engine has two L1 caches for a total of 256 KB L1 per Shader Engine and 1 MB L1 Cache available locally.
The L0 cache is localized to each CU while the L1 is a private cache available to each shader engine with exclusive L2 access. The L2 cache shares the data between the shader engines and command processor. Finally, there is four 64-bit memory controller that offers up to 448 GB/s bandwidth at 14 Gbps die speeds. However, AMD acknowledges that this solution wasn't enough for the RDNA 2 GPUs without them being bandwidth starved and that's where Infinity Cache lands. (You can read more about Infinity Cache below).
According to AMD, the RDNA 2 compute unit offers increased frequency at lower power, mixed-precision operations for tensor math, sampler feedback streaming, and texture space shading & most importantly, ray accelerators offering 4 boxes or 1 triangle intersection per cycle. Compute models for RDNA 2 CUs are listed in the table below:
Coming back to the Big Navi 'Navi 21' GPU itself, the GPU also features a single Geometry processor with 8 Pre-Cull Prims/Cycle and 4 Post-Cull Prims/Cycle. There's also a new GCP which houses a new graphics engine and a total of 4 Asynchronous Compute Units. The redesigned Render Back-End is now referred to as 'RB+' and features natively doubled the 32bpp color rate by processing eight 32-bit pixels per cycle.
Overall, the Big Navi GPU has to offer a total of 80 Compute Units for a total of 5120 cores, 320 texture units, and 80 ray accelerator units. This configuration will be possible for the Radeon RX 6800 XT and RX 6900 XT graphics cards. AMD says that all of these contribute to the performance per watt gains for Big Navi of up to 54% which are broken down as follows:
16% Through Design Frequency Increase:
- Leverage CPU high-frequency expertise
- High-speed performance libraries
- Streamlined micro-architecture and design
- Aggressive re-pipelined logic for speed
17% Through CAC and Power Optimizations:
- Pervasive fine-grain clock gating
- Clock tree splitting and gating
- Redesigned for minimal data movement
- Aggressive pipeline rebalancing
21% Through Performance per Clock Enhancement:
- Infinity cache amplified low latency/power bandwidth
- TLB streamlined for latency reductions
- Redesign 32b pixel pipe and included new HDR format
- Optimized geometry distribution and tessellation
AMD Infinity Cache, Bringing SRAM To GPUs!
As explained above, Infinity Cache is not only a new feature but an essential feature to make the Big Navi GPUs work as intended. Without it, the RDNA 2 GPUs are severely bandwidth limited due to their reduced standard bandwidth design. AMD found the solutions by looking at its own Ryzen and EPYC line of processors which utilize density optimized cache subsystems.
The Infinity Cache, in general, is a 16x64b channel subsystem with a peak rated speed of 1.94 GHz, delivering up to 4x the peak bandwidth of the standard 256bit GDDR6 solution that is integrated on Big Navi GPUs. While power scales literally with GDDR6 bus-widths, an Infinity Cache solution can provide up to 2.4x higher bandwidth/watt. This allows better scaling with higher frequencies that the RDNA 2 architecture aims to provide. That's not it, a larger Infinity Cache will also result in lower latency. Compared to the RX 5700 with GDDR6 memory, the RX 6800 delivers 34% lesser latency on average.
That's not all, the Infinity Cache can be configured for even higher bandwidth with its boost turbo-charged configuration uplifting the total bandwidth by 550 GB/s, delivering almost 2 TB/s of total and effective bandwidth to the GPU in addition to the GDDR6 memory that's already on board the graphics card. AMD allows infinity cache to scale with power management and tuning options via its Radeon Software.