MSI GeForce RTX 4090 SUPRIM Liquid X / MSI GeForce RTX 4090 SUPRIM X12th October, 2022
Price$1749.99 US / $1699.99 US
NVIDIA Ada GPU - Ada Streaming Multiprocessor, Ada GPC &; Ada GPUs Deep Dive
Let's take a trip down the journey to Ada. In 2016, NVIDIA announced their Pascal GPUs which would soon be featured in their top to bottom GeForce 10 series lineup. After the launch of Maxwell, NVIDIA gained a lot of experience in the efficiency department which they put a focus on since their Kepler GPUs.
Four years ago, NVIDIA, rather than offering another standard leap in the rasterization performance of its GPUs took a different approach & introduced two key technologies in its Turing line of consumer GPUs, one being AI-assisted acceleration with the Tensor Cores and the second being hardware-level acceleration for Ray Tracing with its brand new RT cores.
Then came Ampere with its brand new Samsung 8nm fabrication process, NVIDIA added even more to its gaming graphics lineup. In the Ampere GPU architecture, NVIDIA provided its latest Ampere SM along with next-gen FP32, INT32, Tensor Cores, and RT cores. The focus was to boost both rasterization and ray tracing capabilities to new heights.
Now enter Ada, a brand new architecture that aims to take everything from the first two RTX GPUs and perfect it. The graphics architecture is designed for speed and that it excels at. So let's see the architecture in detail. Following are the few main highlights of the Ada Lovelace GPU architecture:
- Revolutionary New Architecture: NVIDIA Ada architecture GPUs deliver outstanding performance for graphics, AI, and compute workloads with exceptional architectural and power efficiency. After the baseline design for the Ada SM was established, the chip was scaled up to shatter records. Manufacturing innovations and materials research enabled NVIDIA engineers to craft a GPU with 76.3 billion transistors and 18,432 CUDA Cores capable of running at clocks over 2.5 GHz while maintaining the same 450W TGP as the prior generation flagship GeForce RTX 3090 Ti GPU. The result is the world’s fastest GPU with the power, acoustics, and temperature characteristics expected of a high-end graphics card.
- New Ada RT Core for Faster Ray Tracing: For decades, rendering ray-traced scenes with physically correct lighting in real-time has been considered the holy grail of graphics. At the same time, the geometric complexity of environments and objects continues to increase as 3D games and graphics continually strive to provide the most accurate representations of the real world. The Ada RT Core has been enhanced to deliver 2x faster ray-triangle intersection testing and includes two important new hardware units. An Opacity Micro map Engine speeds up ray tracing of alpha-tested geometry by a factor of 2x, and a Displaced Micro-Mesh Engine generates Displaced Micro-Triangles on-the-fly to create additional geometry. The Micro-Mesh Engine provides the benefit of increased geometric complexity without the traditional performance and storage costs of complex geometries.
- Shader Execution Reordering: NVIDIA Ada GPUs support Shader Execution Reordering which dynamically organizes & reorders shading workloads to improve RT shading Introduction efficiency. This improves performance by up to 44% in Cyberpunk 2077 with Ray Tracing Overdrive Mode.
- NVIDIA DLSS 3: The Ada architecture features an all-new Optical Flow Accelerator and AI frame generation that boosts DLSS 3’s frame rates up to 2x over the previous DLSS 2.0 while maintaining or exceeding native image quality. Compared to traditional brute-force graphics rendering, DLSS 3 is ultimately up to 4x faster while providing low system latency.
The NVIDIA Ada Lovelace AD102 GPU features up to 12 GPC (Graphics Processing Clusters). These are 5 more SMs compared to the Ampere GA102 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don't share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.
So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 144 SM units (12 per GPC), we are looking at a total of 18,432 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM & their own L0 i-cache. This is a 33% increase in Wraps/Threads vs the GA102 GPU. The Register file size is 16,384 across a 32-bit lane. Each SM also carries its own 128 KB of L1 data cache and shared memory so that's 18 MB of L1 cache.
Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 96 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 6 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 192 ROPs for the full-die.
There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:
- 71% More GPCs (Versus Ampere)
- 71% More Cores (Versus Ampere)
- 50% More L1 Cache (Versus Ampere)
- 16x More L2 Cache (Versus Ampere)
- 71% More ROPs (Versus Ampere)
- 4th Gen Tensor & 3rd Gen RT Cores
The full die has not been featured on any GPU so far, not even the L40 which has 2 SMs disabled. It is likely that as yields progress, we will eventually see a gaming and workstation product using the full-fat AD102. Till then, the RTX 4090 is the top gaming graphics card while the RTX 6000 Ada is the top workstation solution.
NVIDIA AD102 'Ada Lovelace' Gaming GPU Block Diagram:
NVIDIA AD102 'Ada Lovelace' Gaming GPU 'SM' Block Diagram:
NVIDIA GeForce RTX 4090
- 82.6 TFLOPS of peak single-precision (FP32) performance
- 165.2 TFLOPS of peak half-precision (FP16) performance
- 660.6 Tensor TFLOPS
- 1321.2 Tensor TFLOPs with sparsity
- 191 RT-TFLOPs
At the heart of the NVIDIA GeForce RTX 4090 graphics card lies the Ada Lovelace AD102 GPU. The GPU measures 608,4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team. The GPU features an insane 76.3 Billion transistors.
NVIDIA Ampere "GeForce RTX 30" GPUs Full Breakdown:
|Graphics Card||NVIDIA GeForce RTX 2070 SUPER||NVIDIA GeForce RTX 3070||NVIDIA GeForce RTX 2080||NVIDIA GeForce RTX 3080||NVIDIA Titan RTX||NVIDIA GeForce RTX 3090|
|GPU Architecture||NVIDIA Turing||NVIDIA Ampere||NVIDIA Turing||NVIDIA Ampere||NVIDIA Turing||NVIDIA Ampere|
|GPCs||5 or 6||6||6||6||6||7|
|CUDA Cores / SM||64||128||64||128||64||128|
|CUDA Cores / GPU||2560||5888||2944||8704||4608||10496|
|Tensor Cores / SM||8 (2nd Gen)||4 (3rd Gen)||8 (2nd Gen)||4 (3rd Gen)||8 (2nd Gen)||4 (3rd Gen)|
|Tensor Cores / GPU||320 (2nd Gen)||184 (3rd Gen)||368||272 (3rd Gen)||576 (2nd Gen)||328 (3rd Gen)|
|RT Cores||40 (1st Gen)||46 (2nd Gen)||46 (1st Gen)||68 (2nd Gen)||72 (1st Gen)||82 (2nd Gen)|
|GPU Boost Clock (MHz)||1770||1725||1800||1710||1770||1695|
|Peak FP32 TFLOPS (non-Tensor)||9.1||20.3||10.6||29.8||16.3||35.6|
|Peak FP16 TFLOPS (non-Tensor)||18.1||20.3||21.2||29.8||32.6||35.6|
|Peak BF16 TFLOPS (non-Tensor)||NA||20.3||NA||29.8||NA||35.6|
|Peak INT32 TOPS (non-Tensor)||9.1||10.2||10.6||14.9||16.3||17.8|
|Peak FP16 Tensor TFLOPS|
with FP16 Accumulate
|Peak FP16 Tensor TFLOPS|
with FP32 Accumulate
|Peak BF16 Tensor TFLOPS|
with FP32 Accumulate
|Peak TF32 Tensor TFLOPS||NA||20.3/40.6||NA||29.8/59.5||NA||35.6/71|
|Peak INT8 Tensor TOPS||145||162.6/325.2||169.6||238/476||261||284/568|
|Peak INT4 Tensor TOPS||290||325.2/650.4||339.1||476/952||522||568/1136|
|Frame Buffer Memory Size and|
|8 GB GDDR6||8 GB GDDR6||8 GB GDDR6||10 GB GDDR6X||24 GB GDDR6||24 GB GDDR6X|
|Memory Clock (Data Rate)||14 Gbps||14 Gbps||14 Gbps||19 Gbps||14 Gbps||19.5 Gbps|
|Memory Bandwidth||448 GB/sec||448 GB/sec||448 GB/sec||760 GB/sec||672 GB/sec||936 GB/sec|
|Pixel Fill-rate (Gigapixels/sec)||113.3||165.6||115.2||164.2||169.9||193|
|Texel Fill-rate (Gigatexels/sec)||283.2||317.4||331.2||465||509.8||566|
|L1 Data Cache/Shared Memory||3840||5888||4416 KB||8704 KB||6912 KB||10496 KB|
|L2 Cache Size||4096 KB||4096 KB||4096 KB||5120 KB||6144 KB||6144 KB|
|Register File Size||10240 KB||11776 KB||11776 KB||17408 KB||18432 KB||20992 KB|
|TGP (Total Graphics Power)||215 Watts||220W||225W||320W||280W||350W|
|Transistor Count||13.6 Billion||17.4 Billion||13.6 Billion||28.3 Billion||18.6 Billion||28.3 Billion|
|Die Size||545 mm2||392.5 mm2||545 mm2||628.4 mm2||754mm2||628.4 mm2|
|Manufacturing Process||TSMC 12 nm FFN|
|Samsung 8 nm 8N NVIDIA|
|TSMC 12 nm FFN|
|Samsung 8 nm 8N NVIDIA|
|TSMC 12 nm FFN|
|Samsung 8 nm 8N NVIDIA
NVIDIA Ada GPUs - AD102, AD103, AD104 For The First Wave of Gaming Cards
NVIDIA is first introducing three brand new Ada GPUs which include the AD102, AD103 & AD104. The AD102 GPU is going to be featured on the GeForce RTX 4090, the AD103 is going to be used by the GeForce RTX 4080 16 GB graphics cards and the AD104 GPU is going to be featured on the GeForce RTX 4080 12 GB graphics cards.
The Ada GPUs are based on the TSMC 4N process node which is a custom process designed exclusively for NVIDIA. It is essentially an optimized version of the N5 (5nm) process, offering drastic increases in transistors, cores, and frequency. The top AD102 GPU packs 70% more cores and also offers 76.3 Billion transistors while offering over 2x the performance per watt.
NVIDIA Ada AD102 GPU
The full AD102 GPU is made up of 12 graphics processing clusters with 12 SM units on each cluster. That makes up 144 SM units for a total of 18432 cores, 144 RT cores, 576 Tensor Cores, 576 Texture Units, and a 384-bit bus interface in a 76.3 billion transistor package measuring 608,5mm2.
Products mentioned in this post
The links above are affiliate links. As an Amazon Associate, Wccftech.com may earn from qualifying purchases.