Hardware Rumor

NVIDIA Ada Lovelace ‘GeForce RTX 40’ Gaming GPU Detailed: Double The ROPs, Huge L2 Cache & 50% More FP32 Units Than Ampere, 4th Gen Tensor & 3rd Gen RT Cores

Hassan Mujtaba • May 14, 2022 at 12:00am EDT

Details regarding the NVIDIA Ada Lovelace Gaming GPU which will power the GeForce RTX 40 series graphics cards have been revealed. The new information comes from Kopte7kimi & talks about the block diagram of the next-gen architecture.

NVIDIA GeForce Ada Lovelace GPU SM Block Diagram Detailed: Bigger & Better Than Ever For Gamers!

The NVIDIA Ada Lovelace GPU architecture is no mystery anymore. We have learned the specific configurations that will power the next Gen AD10* series SKUs for GeForce RTX 40 series graphics cards and we have also seen leaked specifications of the lineup. Now, it's time to talk purely about the next-generation graphics chip itself.

NVIDIA AD102 'Ada Lovelace' Gaming GPU 'SM' Block Diagram (Image Credits: Kopite7kimi):

NVIDIA GA102 'Ampere' Gaming GPU 'SM' Block Diagram:

Starting with the GPU configuration, Kopite7kimi compares the top AD102 GPU to various other GPUs from the green team. These include the gaming-focused Ampere GA102 and Turing TU102 while there's also the HPC-Focused Hopper GH100 and Ampere GA100 added to the list. I'll only compare the AD102 to its gaming predecessors since the HPC-focused designs are vastly different than consumer-centric offerings.

The NVIDIA Ada Lovelace AD102 GPU will feature up to 12 GPC (Graphics Processing Clusters). This is an increase of 70% versus GA102 which features only 7 GPCs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 128 FP32 units but combined FP32+INT32 units will go up to 192. This is because the FP32 units don't share the same sub-core as the IN32 units. The 128 FP32 cores are separate from the 64 INT32 cores.

So in total, each sub-core will consist of 128 FP32 plus 64 INT32 units for a total of 192 units. Each SM will have a total of 512 FP32 units plus 256 INT32 units for a total of 768 units. And since there are a total of 24 SM units (2 per GPC), we are looking at 12,288 FP32 Units and 6,144 INT32 units for a total of 18,432 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM. This is a 50% increase on the cores (FP32+INT32) and a 33% increase in Wraps/Threads vs the GA102 GPU.

NVIDIA Ada Lovelace 'AD103' GPU Specs 'Preliminary':

GPU Name	AD103	GA102	GA103	TU102
GPC	7 (Per GPU)	Same	1.16x	1.16x
TPC	6 (Per GPC)	Same	1.20x	Same
SM	2 (Per TPC)	Same	Same	Same
Sub-Core	4 (Per SM)	Same	Same	Same
FP32	128 (Per SM)	Same	Same	2x
FP32+INT32	192 (Per SM)	1.5x	1.5x	1.5x
Warps	64 (Per SM)	1.33x	1.33x	2x
Threads	2048 (Per SM)	1.33x	1.33x	2x
L1 Cache	192 KB (Per SM)	1.5x	1.5x	2x
L2 Cache	64 MB (Per GPU)	10.6x	16x	10.6x
ROPs	32 (Per GPC)	2x	2x	2x

Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The Ada Lovelace GPUs will pack 192 KB of L1 cache per SM, an increase of 50% over Ampere. That's a total of 4.5 MB of L1 cache on the top AD102 GPU. The L2 cache will be increased to 96 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 6 MB of L2 cache. The cache will be shared across the GPU.

Finally, we have the ROPs which are also increased to 32 per GPC, an increase of 2x over Ampere. You are looking at up to 384 ROPs on the next-gen flagship versus just 112 on the fastest Ampere GPU, the RTX 3090 Ti. There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:

2x GPCs (Versus Ampere)
50% More Cores (Versus Ampere)
50% More L1 Cache (Versus Ampere)
16x More L2 Cache (Versus Ampere)
Double The ROPs (Versus Ampere)
4th Gen Tensor & 3rd Gen RT Cores

Do note that clock speeds, which are said to be between the 2-3 GHz range, aren't taken into the equation so they will also play a major role in improving the per-core performance versus Ampere. The NVIDIA GeForce RTX 40 series graphics cards featuring the next-gen Ada Lovelace gaming GPUs are expected to launch in the second half of 2022 & are said to utilize the same TSMC 4N process node as the Hopper H100 GPU.

NVIDIA CUDA GPU (RUMORED) Preliminary:

GPU	TU102	GA102	AD102
Flagship SKU	RTX 2080 Ti	RTX 3090 Ti	RTX 4090?
Architecture	Turing	Ampere	Ada Lovelace
Process	TSMC 12nm NFF	Samsung 8nm	TSMC 4N?
Die Size	754mm2	628mm2	~600mm2
Graphics Processing Clusters (GPC)	6	7	12
Texture Processing Clusters (TPC)	36	42	72
Streaming Multiprocessors (SM)	72	84	144
CUDA Cores	4608	10752	18432
L2 Cache	6 MB	6 MB	96 MB
Theoretical TFLOPs	16 TFLOPs	40 TFLOPs	~90 TFLOPs?
Memory Type	GDDR6	GDDR6X	GDDR6X
Memory Capacity	11 GB (2080 Ti)	24 GB (3090 Ti)	24 GB (4090?)
Memory Speed	14 Gbps	21 Gbps	24 Gbps?
Memory Bandwidth	616 GB/s	1.008 GB/s	1152 GB/s?
Memory Bus	384-bit	384-bit	384-bit
PCIe Interface	PCIe Gen 3.0	PCIe Gen 4.0	PCIe Gen 4.0
TGP	250W	350W	600W?
Release	Sep. 2018	Sept. 20	2H 2022 (TBC)

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Ada Lovelace ‘GeForce RTX 40’ Gaming GPU Detailed: Double The ROPs, Huge L2 Cache & 50% More FP32 Units Than Ampere, 4th Gen Tensor & 3rd Gen RT Cores

NVIDIA Ada Lovelace ‘GeForce RTX 40’ Gaming GPU Detailed: Double The ROPs, Huge L2 Cache & 50% More FP32 Units Than Ampere, 4th Gen Tensor & 3rd Gen RT Cores

NVIDIA GeForce Ada Lovelace GPU SM Block Diagram Detailed: Bigger & Better Than Ever For Gamers!

NVIDIA Ada Lovelace 'AD103' GPU Specs 'Preliminary':

NVIDIA CUDA GPU (RUMORED) Preliminary:

Trending Stories

Kirin 9030 In-Depth Analysis Proves SMIC Can Create Denser SoCs Than Intel Has With Its 18A Node, But The Attributes That Require Improvements Are Left Out

A Modder Fits Entire Grand Theft Auto PS2 Trilogy Inside a Single Game, While Rockstar Continues to Prepare GTA 6

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

Nintendo Doubles Down on Switch 2 Security, But Developer Gezine Cracks a Universal Exploit That Works Entirely Offline

AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

NVIDIA Ada Lovelace ‘GeForce RTX 40’ Gaming GPU Detailed: Double The ROPs, Huge L2 Cache & 50% More FP32 Units Than Ampere, 4th Gen Tensor & 3rd Gen RT Cores

NVIDIA GeForce Ada Lovelace GPU SM Block Diagram Detailed: Bigger & Better Than Ever For Gamers!

Related Story CPUID Rolls Out HWMonitor v1.65.1, Removing Additional Hot Spot Temperature Reading

NVIDIA Ada Lovelace 'AD103' GPU Specs 'Preliminary':

NVIDIA CUDA GPU (RUMORED) Preliminary:

Further Reading

Trending Stories

Popular Discussions