Hardware

MSI GeForce RTX 4090 SUPRIM X & SUPRIM Liquid X Review – When Air & Water Collide!

Hassan Mujtaba • Oct 12, 2022 at 09:00am EDT

Product Info

MSI GeForce RTX 4090 SUPRIM Liquid X / MSI GeForce RTX 4090 SUPRIM X

12th October, 2022

Type

Graphics Cards

Price

$1749.99 US / $1699.99 US

NVIDIA Ada GPU - Ada Streaming Multiprocessor, Ada GPC &; Ada GPUs Deep Dive

Let's take a trip down the journey to Ada. In 2016, NVIDIA announced their Pascal GPUs which would soon be featured in their top to bottom GeForce 10 series lineup. After the launch of Maxwell, NVIDIA gained a lot of experience in the efficiency department which they put a focus on since their Kepler GPUs.

Four years ago, NVIDIA, rather than offering another standard leap in the rasterization performance of its GPUs took a different approach & introduced two key technologies in its Turing line of consumer GPUs, one being AI-assisted acceleration with the Tensor Cores and the second being hardware-level acceleration for Ray Tracing with its brand new RT cores.

Then came Ampere with its brand new Samsung 8nm fabrication process, NVIDIA added even more to its gaming graphics lineup. In the Ampere GPU architecture, NVIDIA provided its latest Ampere SM along with next-gen FP32, INT32, Tensor Cores, and RT cores. The focus was to boost both rasterization and ray tracing capabilities to new heights.

Now enter Ada, a brand new architecture that aims to take everything from the first two RTX GPUs and perfect it. The graphics architecture is designed for speed and that it excels at. So let's see the architecture in detail. Following are the few main highlights of the Ada Lovelace GPU architecture:

Revolutionary New Architecture: NVIDIA Ada architecture GPUs deliver outstanding performance for graphics, AI, and compute workloads with exceptional architectural and power efficiency. After the baseline design for the Ada SM was established, the chip was scaled up to shatter records. Manufacturing innovations and materials research enabled NVIDIA engineers to craft a GPU with 76.3 billion transistors and 18,432 CUDA Cores capable of running at clocks over 2.5 GHz while maintaining the same 450W TGP as the prior generation flagship GeForce RTX 3090 Ti GPU. The result is the world’s fastest GPU with the power, acoustics, and temperature characteristics expected of a high-end graphics card.

New Ada RT Core for Faster Ray Tracing: For decades, rendering ray-traced scenes with physically correct lighting in real-time has been considered the holy grail of graphics. At the same time, the geometric complexity of environments and objects continues to increase as 3D games and graphics continually strive to provide the most accurate representations of the real world. The Ada RT Core has been enhanced to deliver 2x faster ray-triangle intersection testing and includes two important new hardware units. An Opacity Micro map Engine speeds up ray tracing of alpha-tested geometry by a factor of 2x, and a Displaced Micro-Mesh Engine generates Displaced Micro-Triangles on-the-fly to create additional geometry. The Micro-Mesh Engine provides the benefit of increased geometric complexity without the traditional performance and storage costs of complex geometries.

Shader Execution Reordering: NVIDIA Ada GPUs support Shader Execution Reordering which dynamically organizes & reorders shading workloads to improve RT shading Introduction efficiency. This improves performance by up to 44% in Cyberpunk 2077 with Ray Tracing Overdrive Mode.

NVIDIA DLSS 3: The Ada architecture features an all-new Optical Flow Accelerator and AI frame generation that boosts DLSS 3’s frame rates up to 2x over the previous DLSS 2.0 while maintaining or exceeding native image quality. Compared to traditional brute-force graphics rendering, DLSS 3 is ultimately up to 4x faster while providing low system latency.

The NVIDIA Ada Lovelace AD102 GPU features up to 12 GPC (Graphics Processing Clusters). These are 5 more SMs compared to the Ampere GA102 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don't share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.

So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 144 SM units (12 per GPC), we are looking at a total of 18,432 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM & their own L0 i-cache. This is a 33% increase in Wraps/Threads vs the GA102 GPU. The Register file size is 16,384 across a 32-bit lane. Each SM also carries its own 128 KB of L1 data cache and shared memory so that's 18 MB of L1 cache.

Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 96 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 6 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 192 ROPs for the full-die.

There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:

71% More GPCs (Versus Ampere)
71% More Cores (Versus Ampere)
50% More L1 Cache (Versus Ampere)
16x More L2 Cache (Versus Ampere)
71% More ROPs (Versus Ampere)
4th Gen Tensor & 3rd Gen RT Cores

The full die has not been featured on any GPU so far, not even the L40 which has 2 SMs disabled. It is likely that as yields progress, we will eventually see a gaming and workstation product using the full-fat AD102. Till then, the RTX 4090 is the top gaming graphics card while the RTX 6000 Ada is the top workstation solution.

NVIDIA AD102 'Ada Lovelace' Gaming GPU Block Diagram:

NVIDIA AD102 'Ada Lovelace' Gaming GPU 'SM' Block Diagram:

NVIDIA GeForce RTX 4090

82.6 TFLOPS of peak single-precision (FP32) performance
165.2 TFLOPS of peak half-precision (FP16) performance
660.6 Tensor TFLOPS
1321.2 Tensor TFLOPs with sparsity
191 RT-TFLOPs

At the heart of the NVIDIA GeForce RTX 4090 graphics card lies the Ada Lovelace AD102 GPU. The GPU measures 608,4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team. The GPU features an insane 76.3 Billion transistors.

NVIDIA Ampere "GeForce RTX 30" GPUs Full Breakdown:

Graphics Card	NVIDIA GeForce RTX 2070 SUPER	NVIDIA GeForce RTX 3070	NVIDIA GeForce RTX 2080	NVIDIA GeForce RTX 3080	NVIDIA Titan RTX	NVIDIA GeForce RTX 3090
GPU Codename	TU106	GA104	TU104	GA102	TU102	GA102
GPU Architecture	NVIDIA Turing	NVIDIA Ampere	NVIDIA Turing	NVIDIA Ampere	NVIDIA Turing	NVIDIA Ampere
GPCs	5 or 6	6	6	6	6	7
TPCs	20	23	23	34	36	41
SMs	40	46	46	68	72	82
CUDA Cores / SM	64	128	64	128	64	128
CUDA Cores / GPU	2560	5888	2944	8704	4608	10496
Tensor Cores / SM	8 (2nd Gen)	4 (3rd Gen)	8 (2nd Gen)	4 (3rd Gen)	8 (2nd Gen)	4 (3rd Gen)
Tensor Cores / GPU	320 (2nd Gen)	184 (3rd Gen)	368	272 (3rd Gen)	576 (2nd Gen)	328 (3rd Gen)
RT Cores	40 (1st Gen)	46 (2nd Gen)	46 (1st Gen)	68 (2nd Gen)	72 (1st Gen)	82 (2nd Gen)
GPU Boost Clock (MHz)	1770	1725	1800	1710	1770	1695
Peak FP32 TFLOPS (non-Tensor)	9.1	20.3	10.6	29.8	16.3	35.6
Peak FP16 TFLOPS (non-Tensor)	18.1	20.3	21.2	29.8	32.6	35.6
Peak BF16 TFLOPS (non-Tensor)	NA	20.3	NA	29.8	NA	35.6
Peak INT32 TOPS (non-Tensor)	9.1	10.2	10.6	14.9	16.3	17.8
Peak FP16 Tensor TFLOPS with FP16 Accumulate	72.5	81.3/162.6	84.8	119/238	130.5	142/284
Peak FP16 Tensor TFLOPS with FP32 Accumulate	36.3	40.6/81.3	42.4	59.5/119	65.2	71/142
Peak BF16 Tensor TFLOPS with FP32 Accumulate	NA	40.6/81.3	NA	59.5/119	NA	71/142
Peak TF32 Tensor TFLOPS	NA	20.3/40.6	NA	29.8/59.5	NA	35.6/71
Peak INT8 Tensor TOPS	145	162.6/325.2	169.6	238/476	261	284/568
Peak INT4 Tensor TOPS	290	325.2/650.4	339.1	476/952	522	568/1136
Frame Buffer Memory Size and Type	8 GB GDDR6	8 GB GDDR6	8 GB GDDR6	10 GB GDDR6X	24 GB GDDR6	24 GB GDDR6X
Memory Interface	256-bit	256-bit	256-bit	320-bit	384-bit	384-bit
Memory Clock (Data Rate)	14 Gbps	14 Gbps	14 Gbps	19 Gbps	14 Gbps	19.5 Gbps
Memory Bandwidth	448 GB/sec	448 GB/sec	448 GB/sec	760 GB/sec	672 GB/sec	936 GB/sec
ROPs	64	96	64	96	96	112
Pixel Fill-rate (Gigapixels/sec)	113.3	165.6	115.2	164.2	169.9	193
Texture Units	160	184	184	272	288	328
Texel Fill-rate (Gigatexels/sec)	283.2	317.4	331.2	465	509.8	566
L1 Data Cache/Shared Memory	3840	5888	4416 KB	8704 KB	6912 KB	10496 KB
L2 Cache Size	4096 KB	4096 KB	4096 KB	5120 KB	6144 KB	6144 KB
Register File Size	10240 KB	11776 KB	11776 KB	17408 KB	18432 KB	20992 KB
TGP (Total Graphics Power)	215 Watts	220W	225W	320W	280W	350W
Transistor Count	13.6 Billion	17.4 Billion	13.6 Billion	28.3 Billion	18.6 Billion	28.3 Billion
Die Size	545 mm2	392.5 mm2	545 mm2	628.4 mm2	754mm2	628.4 mm2
Manufacturing Process	TSMC 12 nm FFN (FinFET NVIDIA)	Samsung 8 nm 8N NVIDIA Custom Process	TSMC 12 nm FFN (FinFET NVIDIA)	Samsung 8 nm 8N NVIDIA Custom Process	TSMC 12 nm FFN (FinFET NVIDIA)	Samsung 8 nm 8N NVIDIA Custom Process

NVIDIA Ada GPUs - AD102, AD103, AD104 For The First Wave of Gaming Cards

NVIDIA is first introducing three brand new Ada GPUs which include the AD102, AD103 & AD104. The AD102 GPU is going to be featured on the GeForce RTX 4090, the AD103 is going to be used by the GeForce RTX 4080 16 GB graphics cards and the AD104 GPU is going to be featured on the GeForce RTX 4080 12 GB graphics cards.

The Ada GPUs are based on the TSMC 4N process node which is a custom process designed exclusively for NVIDIA. It is essentially an optimized version of the N5 (5nm) process, offering drastic increases in transistors, cores, and frequency. The top AD102 GPU packs 70% more cores and also offers 76.3 Billion transistors while offering over 2x the performance per watt.

NVIDIA Ada AD102 GPU

The full AD102 GPU is made up of 12 graphics processing clusters with 12 SM units on each cluster. That makes up 144 SM units for a total of 18432 cores, 144 RT cores, 576 Tensor Cores, 576 Texture Units, and a 384-bit bus interface in a 76.3 billion transistor package measuring 608,5mm2.

You can find additional information about our hardware review process and ethics policy here.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on MSI GeForce RTX 4090 SUPRIM X & SUPRIM Liquid X Review – When Air & Water Collide!

MSI GeForce RTX 4090 SUPRIM X & SUPRIM Liquid X Review – When Air & Water Collide!

MSI GeForce RTX 4090 SUPRIM Liquid X / MSI GeForce RTX 4090 SUPRIM X

Type

Price

NVIDIA Ada GPU - Ada Streaming Multiprocessor, Ada GPC &; Ada GPUs Deep Dive

Related Story CPUID Rolls Out HWMonitor v1.65.1, Removing Additional Hot Spot Temperature Reading

NVIDIA Ampere "GeForce RTX 30" GPUs Full Breakdown:

NVIDIA Ada GPUs - AD102, AD103, AD104 For The First Wave of Gaming Cards

Contents

Further Reading

Open-Source NVIDIA NVK Vulkan Driver Receives DLSS Support In Mesa 26.2

Scammers Now Selling Fake NVIDIA RTX Graphics Cards With Glued-Down Plastic GPUs & Scrap Memory

NVIDIA’s 96 GB RTX PRO 6000 Blackwell Is Now Over 50% More Expensive As Price Hits $13,250

ZOTAC Marks 20 Years With a Gold-Themed RTX 5070 Ti, Two RTX 5080 Liquid-Cooled Prototypes, & The World’s Smallest PC With A Desktop 5080