During its press tech talk, NVIDIA talked about several technologies surrounding the upcoming GeForce RTX 40 graphics cards based on the Ada Lovelace GPUs. Some of the technologies that were highlighted included the Ada Lovelace GPU itself, the latest DLSS 3 technology, and coolers featured on the brand new Founders Edition models.
NVIDIA Further Details Ada Lovelace GPUs, DLSS 3, GeForce RTX 40 Graphics Cards & More
NVIDIA will be launching its first GeForce RTX 40 series graphics card, the RTX 4090, on the 12th of October, followed by the RTX 4080 series in November. There's a lot to talk about so let's get us started.
NVIDIA's AD102 'Ada Lovelace' GPU - The Next-Gen Powerhouse
At the heart of the NVIDIA GeForce RTX 4090 graphics card lies the Ada Lovelace AD102 GPU. The GPU measures 608,4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team. The GPU features an insane 76.3 Billion transistors.
The NVIDIA Ada Lovelace AD102 GPU features up to 12 GPC (Graphics Processing Clusters). These are 5 more SMs compared to the Ampere GA102 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don't share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.
So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 144 SM units (12 per GPC), we are looking at a total of 18,432 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM & their own L0 i-cache. This is a 33% increase in Wraps/Threads vs the GA102 GPU. The Register file size is 16,384 across a 32-bit lane. Each SM also carries its own 128 KB of L1 data cache and shared memory so that's 18 MB of L1 cache.
Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 96 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 6 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 192 ROPs for the full-die.
There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:
- 71% More GPCs (Versus Ampere)
- 71% More Cores (Versus Ampere)
- 50% More L1 Cache (Versus Ampere)
- 16x More L2 Cache (Versus Ampere)
- 71% More ROPs (Versus Ampere)
- 4th Gen Tensor & 3rd Gen RT Cores
The full die has not been featured on any GPU so far, not even the L40 which has 2 SMs disabled. It is likely that as yields progress, we will eventually see a gaming and workstation product using the full-fat AD102. Till then, the RTX 4090 is the top gaming graphics card while the RTX 6000 Ada is the top workstation solution.
NVIDIA AD102 'Ada Lovelace' Gaming GPU Block Diagram:
NVIDIA AD102 'Ada Lovelace' Gaming GPU 'SM' Block Diagram:
NVIDIA Founders Edition Designed To Utilize Up To 600W of Power For Higher Overclocking
As for its brand new Founders Edition cards, the GeForce RTX 4090 24 GB and RTX 4080 16 GB, NVIDIA has produced a compact PCB, similar to the ones we saw on the previous generation & designing a PCB like this helps improve airflow and cooling performance.
NVIDIA says that they have further optimized the Dual Axial Flow Through system, increasing fan sizes and fin volume by 10%, offering 20% higher air-flow, and upgrading to a 23-phase power supply (20+3 Phase for RTX 4090). Memory temperatures are reduced, and the new, substantially more powerful Ada GPUs are kept cool in ventilated cases, giving gamers excellent overclocking headroom. NVIDIA went through a rigorous testing procedure and is said to have evaluated as many as 50 fan designs before finalizing the one we are getting on the new cards. The cooler is used to dissipate heat from the heatsink assembly that comprises a vapor chamber, a big jump from the previous design too.
The NVIDIA GeForce RTX 4080 also uses the same cooler as the RTX 4090 Founders Edition and since it has a lower TDP, it should deliver even better thermal performance.
Each GeForce RTX 40 Series Founders Edition graphics card reduces cable clutter by leveraging the new standard GPU power input of next-gen ATX 3.0 power supplies, the PCIe Gen-5 16-pin Connector. This enables you to power GeForce RTX 40 Series graphics cards with just a single cable, improving the aesthetics of your build. If you are using a previous-gen power supply, an adapter cable is included in the box, allowing you to plug in three 8-pin power connectors, with an optional fourth connector for more overclocking headroom. ATX 3.0 power supplies will be available in October from ASUS, Cooler Master, FSP, Gigabyte, iBuyPower, MSI, and ThermalTake, with more models to come.
One advantage that comes with the new 16-pin connector is that while the Founders Edition cards are designed at 450W & 320W, respectively, they can utilize the extra headroom provided through the new connector for extreme overclocking with the RTX 4090 going for that full 600W mark. The new power delivery also gives the RTX 40 series a 10x increase in response time to power transient management compared to the previous generation.
The new cards also feature DP 1.4a (4K 12-bit HDR @ 240Hz) and HDMI 2.1 (4K 120Hz HDR / 8K 60Hz HDR). All cards are compliant with the PCIe Gen 4 interface on existing motherboards and also feature full compliance with the Resizable-BAR technologies.
NVIDIA GeForce RTX 4090 Founders Edition PCB:
Next-Gen Micron GDDR6X Dies Run 10C Cooler Thanks To New Process Node
NVIDIA has also leveraged Micron's latest GDDR6X memory chips for its GeForce RTX 40 graphics cards which run 10C cooler, are more power efficient and since they are all 16Gb DRAM dies, they can be fused on one side of the PCB to be cooled better than dual-sided memory.
NVIDIA DLSS 3: Compatibility, Feature Set, Gaming Performance & More
Now, let's dive into the technological advancements that allow these incredible achievements. To begin with, NVIDIA engineers started with DLSS Super Resolution and added something called Optical Multi Frame Generation based on Ada's Optical Flow Accelerator. This accelerator analyzes two sequential frames from a particular game, capturing pixel details such as particles, reflections, lighting, and shadows.
On top of that, NVIDIA DLSS 3 also takes into account conventional game engine information such as motion vectors. The DLSS Frame Generation AI convolutional autoencoder network will then decide how to use each of the four inputs (current and prior frames, optical flow field, and motion vectors) to recreate intermediate frames in the best possible way.
NVIDIA DLSS 3 is said to reconstruct 3/4 of the first frame with DLSS Super Resolution and the full second frame with the help of the aforementioned DLSS Frame Generation. Overall, NVIDIA DLSS 3 reconstructs 7/8 of the two total frames displayed, which explains the massive performance uplift.
Additionally, the new version of the Deep Learning Super Sampling image reconstruction technique also includes the latency-lowering NVIDIA Reflex technology.
So talking about DLSS GPU support, the technology will feature full DLSS Frame Generation across all RTX 40 series GPUs. For the older RTX 20 & RTX 30 series, the technology will be available as the DLSS Super Resolution suite (also on RTX 40). Lastly, NVIDIA Reflex will be supported by GeForce 900 series and above.
Cyberpunk 2077 has been shown running NVIDIA DLSS 3, the brand new Ray Tracing Overdrive, and NVIDIA Reflex with up to 4x improved performance and up to 2x reduced latency. That's not all, as NVIDIA is even promising benefits for CPU-bound games, which generally didn't run much faster with DLSS 2.0. For example, the notoriously CPU-heavy Microsoft Flight Simulator gets up to 2x improved performance with the new DLSS. Overall, NVIDIA said the following over 35 games and apps already pledged support to NVIDIA DLSS 3.
The NVIDIA GeForce RTX 4080 16 GB and RTX 4080 12 GB graphics cards will be launching in November and be priced at $1199 US and $899 US, respectively.
NVIDIA GeForce RTX 40 Series Official Specs:
|Graphics Card Name||NVIDIA GeForce RTX 4090||NVIDIA GeForce RTX 4080 16G||NVIDIA GeForce RTX 4080 12G||NVIDIA GeForce RTX 3090 Ti|
|GPU Name||Ada Lovelace AD102-300||Ada Lovelace AD103-300||Ada Lovelace AD104-400||Ampere GA102-225|
|Process Node||TSMC 4N||TSMC 4N||TSMC 4N||Samsung 8nm|
|Transistors||76 Billion||45.9 Billion||35.8 Billion||28 Billion|
|TMUs / ROPs||512 / 176||320 / 112||240 / 80||320 / 112|
|Tensor / RT Cores||512 / 128||304 / 76||240 / 60||320 / 80|
|Base Clock||2230 MHz||2210 MHz||2310 MHz||1365 MHz|
|Boost Clock||2520 MHz||2510 MHz||2610 MHz||1665 MHz|
|FP32 Compute||83 TFLOPs||49 TFLOPs||40 TFLOPs||40 TFLOPs|
|RT TFLOPs||191 TFLOPs||113 TFLOPs||82 TFLOPs||78 TFLOPs|
|Tensor-TOPs||1321 TOPs||780 TOPs||641 TOPs||320 TOPs|
|Memory Capacity||24 GB GDDR6X||16 GB GDDR6X||12 GB GDDR6X||12 GB GDDR6X|
|Memory Speed||21.0 Gbps||23.0 Gbps||21.0 Gbps||19 Gbps|
|Bandwidth||1008 GB/s||736 GB/s||504 GB/s||912 Gbps|
|Price (MSRP / FE)||$1599 US||$1199 US||$899 US||$1199|
|Launch (Availability)||October 2022||November 2022||November 2022||3rd June 2021|