MSI GeForce RTX 4090 SUPRIM Liquid X Package
Keeping their tradition alive of launching a new graphics architecture every two years, this year, NVIDIA introduces its Ada Lovelace GPU. The Ada GPU is built upon the foundation set by Turing. NVIDIA terms the Ada Lovelace GPUs as a quantum leap over Ampere, and the GeForce RTX 4090 Founders Edition based on NVIDIA Ampere GPU excels at everything versus the previous gen.
The Ada GPU architecture has a lot to be talked about in this review, but so does the new RTX lineup. The Ada lineup offers faster shader performance, faster ray tracing performance, and faster AI performance. Built on a brand new process node and featuring an architecture designed from the ground up, Ada is a killer product with lots of numbers to talk about.
The fundamental of Ada was to take everything NVIDIA learned with its Turing & Ampere architectures and not only refine it but to use its DNA to form a product in a completely new performance category. Tall claims were made by NVIDIA when they introduced its Ada lineup last month with up to 4x performance claims & we will be finding out whether NVIDIA hit all the ticks with its Ada architecture as this review will be your guiding path to see what makes Ada and how it performs against its predecessors.
Today, we will be taking a look at the MSI GeForce RTX 4090 SUPRIM X & RTX 4090 SUPRIM Liquid X. These cards were provided by MSI for the sole purpose of this review & we will be taking a look at their technology, design, and performance metrics in detail.
NVIDIA GeForce RTX 40 Series Gaming Graphics Cards - The Biggest GPU Performance Leap in Recent History
Turing wasn't just any graphics core, it was the graphics core that was to become the foundation of future GPUs. The future is realized now with next-generation consoles going deep in talks about ray tracing and AI-assisted super-sampling techniques. NVIDIA had a head start with Turing & Ampere and its Ada generation will only do things infinitely times better.
The Ada GPU does many traditional things which we would expect from a GPU, but at the same time, also breaks the barrier when it comes to untraditional GPU operations. Just to sum up some features:
- New Streaming Multiprocessor (SM)
- New 4th Gen Tensor Cores
- New Real-Time Ray Tracing Acceleration
- New Shading Enhancements
- New Deep Learning Features For Graphics & Inference
- New GDDR6X High-Performance Memory Subsystem
- New HDMI 2.1 Display Engine & Next-Gen NVENC/NVDEC
The technologies mentioned above are some of the main building blocks of the Ada GPU, but there's more within the graphics core itself which we will talk about in detail so let's get started.
Let's take a trip down the journey to Ada. In 2016, NVIDIA announced their Pascal GPUs which would soon be featured in their top to bottom GeForce 10 series lineup. After the launch of Maxwell, NVIDIA gained a lot of experience in the efficiency department which they put a focus on since their Kepler GPUs.
Four years ago, NVIDIA, rather than offering another standard leap in the rasterization performance of its GPUs took a different approach & introduced two key technologies in its Turing line of consumer GPUs, one being AI-assisted acceleration with the Tensor Cores and the second being hardware-level acceleration for Ray Tracing with its brand new RT cores.
Then came Ampere with its brand new Samsung 8nm fabrication process, NVIDIA added even more to its gaming graphics lineup. In the Ampere GPU architecture, NVIDIA provided its latest Ampere SM along with next-gen FP32, INT32, Tensor Cores, and RT cores. The focus was to boost both rasterization and ray tracing capabilities to new heights.
Now enter Ada, a brand new architecture that aims to take everything from the first two RTX GPUs and perfect it. The graphics architecture is designed for speed and that it excels at. So let's see the architecture in detail. Following are the few main highlights of the Ada Lovelace GPU architecture:
- Revolutionary New Architecture: NVIDIA Ada architecture GPUs deliver outstanding performance for graphics, AI, and compute workloads with exceptional architectural and power efficiency. After the baseline design for the Ada SM was established, the chip was scaled up to shatter records. Manufacturing innovations and materials research enabled NVIDIA engineers to craft a GPU with 76.3 billion transistors and 18,432 CUDA Cores capable of running at clocks over 2.5 GHz while maintaining the same 450W TGP as the prior generation flagship GeForce RTX 3090 Ti GPU. The result is the world’s fastest GPU with the power, acoustics, and temperature characteristics expected of a high-end graphics card.
- New Ada RT Core for Faster Ray Tracing: For decades, rendering ray-traced scenes with physically correct lighting in real-time has been considered the holy grail of graphics. At the same time, the geometric complexity of environments and objects continues to increase as 3D games and graphics continually strive to provide the most accurate representations of the real world. The Ada RT Core has been enhanced to deliver 2x faster ray-triangle intersection testing and includes two important new hardware units. An Opacity Micro map Engine speeds up ray tracing of alpha-tested geometry by a factor of 2x, and a Displaced Micro-Mesh Engine generates Displaced Micro-Triangles on-the-fly to create additional geometry. The Micro-Mesh Engine provides the benefit of increased geometric complexity without the traditional performance and storage costs of complex geometries.
- Shader Execution Reordering: NVIDIA Ada GPUs support Shader Execution Reordering which dynamically organizes & reorders shading workloads to improve RT shading Introduction efficiency. This improves performance by up to 44% in Cyberpunk 2077 with Ray Tracing Overdrive Mode.
- NVIDIA DLSS 3: The Ada architecture features an all-new Optical Flow Accelerator and AI frame generation that boosts DLSS 3’s frame rates up to 2x over the previous DLSS 2.0 while maintaining or exceeding native image quality. Compared to traditional brute-force graphics rendering, DLSS 3 is ultimately up to 4x faster while providing low system latency.
The NVIDIA Ada Lovelace AD102 GPU features up to 12 GPC (Graphics Processing Clusters). These are 5 more SMs compared to the Ampere GA102 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don't share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.
So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 144 SM units (12 per GPC), we are looking at a total of 18,432 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM & their own L0 i-cache. This is a 33% increase in Wraps/Threads vs the GA102 GPU. The Register file size is 16,384 across a 32-bit lane. Each SM also carries its own 128 KB of L1 data cache and shared memory so that's 18 MB of L1 cache.
Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 96 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 6 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 192 ROPs for the full-die.
There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:
- 71% More GPCs (Versus Ampere)
- 71% More Cores (Versus Ampere)
- 50% More L1 Cache (Versus Ampere)
- 16x More L2 Cache (Versus Ampere)
- 71% More ROPs (Versus Ampere)
- 4th Gen Tensor & 3rd Gen RT Cores
The full die has not been featured on any GPU so far, not even the L40 which has 2 SMs disabled. It is likely that as yields progress, we will eventually see a gaming and workstation product using the full-fat AD102. Till then, the RTX 4090 is the top gaming graphics card while the RTX 6000 Ada is the top workstation solution.
NVIDIA AD102 'Ada Lovelace' Gaming GPU Block Diagram:
NVIDIA AD102 'Ada Lovelace' Gaming GPU 'SM' Block Diagram:
NVIDIA GeForce RTX 4090
- 82.6 TFLOPS of peak single-precision (FP32) performance
- 165.2 TFLOPS of peak half-precision (FP16) performance
- 660.6 Tensor TFLOPS
- 1321.2 Tensor TFLOPs with sparsity
- 191 RT-TFLOPs
At the heart of the NVIDIA GeForce RTX 4090 graphics card lies the Ada Lovelace AD102 GPU. The GPU measures 608,4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team. The GPU features an insane 76.3 Billion transistors.
NVIDIA Ampere "GeForce RTX 30" GPUs Full Breakdown:
| Graphics Card | NVIDIA GeForce RTX 2070 SUPER | NVIDIA GeForce RTX 3070 | NVIDIA GeForce RTX 2080 | NVIDIA GeForce RTX 3080 | NVIDIA Titan RTX | NVIDIA GeForce RTX 3090 |
|---|---|---|---|---|---|---|
| GPU Codename | TU106 | GA104 | TU104 | GA102 | TU102 | GA102 |
| GPU Architecture | NVIDIA Turing | NVIDIA Ampere | NVIDIA Turing | NVIDIA Ampere | NVIDIA Turing | NVIDIA Ampere |
| GPCs | 5 or 6 | 6 | 6 | 6 | 6 | 7 |
| TPCs | 20 | 23 | 23 | 34 | 36 | 41 |
| SMs | 40 | 46 | 46 | 68 | 72 | 82 |
| CUDA Cores / SM | 64 | 128 | 64 | 128 | 64 | 128 |
| CUDA Cores / GPU | 2560 | 5888 | 2944 | 8704 | 4608 | 10496 |
| Tensor Cores / SM | 8 (2nd Gen) | 4 (3rd Gen) | 8 (2nd Gen) | 4 (3rd Gen) | 8 (2nd Gen) | 4 (3rd Gen) |
| Tensor Cores / GPU | 320 (2nd Gen) | 184 (3rd Gen) | 368 | 272 (3rd Gen) | 576 (2nd Gen) | 328 (3rd Gen) |
| RT Cores | 40 (1st Gen) | 46 (2nd Gen) | 46 (1st Gen) | 68 (2nd Gen) | 72 (1st Gen) | 82 (2nd Gen) |
| GPU Boost Clock (MHz) | 1770 | 1725 | 1800 | 1710 | 1770 | 1695 |
| Peak FP32 TFLOPS (non-Tensor) | 9.1 | 20.3 | 10.6 | 29.8 | 16.3 | 35.6 |
| Peak FP16 TFLOPS (non-Tensor) | 18.1 | 20.3 | 21.2 | 29.8 | 32.6 | 35.6 |
| Peak BF16 TFLOPS (non-Tensor) | NA | 20.3 | NA | 29.8 | NA | 35.6 |
| Peak INT32 TOPS (non-Tensor) | 9.1 | 10.2 | 10.6 | 14.9 | 16.3 | 17.8 |
| Peak FP16 Tensor TFLOPS with FP16 Accumulate | 72.5 | 81.3/162.6 | 84.8 | 119/238 | 130.5 | 142/284 |
| Peak FP16 Tensor TFLOPS with FP32 Accumulate | 36.3 | 40.6/81.3 | 42.4 | 59.5/119 | 65.2 | 71/142 |
| Peak BF16 Tensor TFLOPS with FP32 Accumulate | NA | 40.6/81.3 | NA | 59.5/119 | NA | 71/142 |
| Peak TF32 Tensor TFLOPS | NA | 20.3/40.6 | NA | 29.8/59.5 | NA | 35.6/71 |
| Peak INT8 Tensor TOPS | 145 | 162.6/325.2 | 169.6 | 238/476 | 261 | 284/568 |
| Peak INT4 Tensor TOPS | 290 | 325.2/650.4 | 339.1 | 476/952 | 522 | 568/1136 |
| Frame Buffer Memory Size and Type | 8 GB GDDR6 | 8 GB GDDR6 | 8 GB GDDR6 | 10 GB GDDR6X | 24 GB GDDR6 | 24 GB GDDR6X |
| Memory Interface | 256-bit | 256-bit | 256-bit | 320-bit | 384-bit | 384-bit |
| Memory Clock (Data Rate) | 14 Gbps | 14 Gbps | 14 Gbps | 19 Gbps | 14 Gbps | 19.5 Gbps |
| Memory Bandwidth | 448 GB/sec | 448 GB/sec | 448 GB/sec | 760 GB/sec | 672 GB/sec | 936 GB/sec |
| ROPs | 64 | 96 | 64 | 96 | 96 | 112 |
| Pixel Fill-rate (Gigapixels/sec) | 113.3 | 165.6 | 115.2 | 164.2 | 169.9 | 193 |
| Texture Units | 160 | 184 | 184 | 272 | 288 | 328 |
| Texel Fill-rate (Gigatexels/sec) | 283.2 | 317.4 | 331.2 | 465 | 509.8 | 566 |
| L1 Data Cache/Shared Memory | 3840 | 5888 | 4416 KB | 8704 KB | 6912 KB | 10496 KB |
| L2 Cache Size | 4096 KB | 4096 KB | 4096 KB | 5120 KB | 6144 KB | 6144 KB |
| Register File Size | 10240 KB | 11776 KB | 11776 KB | 17408 KB | 18432 KB | 20992 KB |
| TGP (Total Graphics Power) | 215 Watts | 220W | 225W | 320W | 280W | 350W |
| Transistor Count | 13.6 Billion | 17.4 Billion | 13.6 Billion | 28.3 Billion | 18.6 Billion | 28.3 Billion |
| Die Size | 545 mm2 | 392.5 mm2 | 545 mm2 | 628.4 mm2 | 754mm2 | 628.4 mm2 |
| Manufacturing Process | TSMC 12 nm FFN (FinFET NVIDIA) | Samsung 8 nm 8N NVIDIA Custom Process | TSMC 12 nm FFN (FinFET NVIDIA) | Samsung 8 nm 8N NVIDIA Custom Process | TSMC 12 nm FFN (FinFET NVIDIA) | Samsung 8 nm 8N NVIDIA Custom Process |
NVIDIA Ada GPUs - AD102, AD103, AD104 For The First Wave of Gaming Cards
NVIDIA is first introducing three brand new Ada GPUs which include the AD102, AD103 & AD104. The AD102 GPU is going to be featured on the GeForce RTX 4090, the AD103 is going to be used by the GeForce RTX 4080 16 GB graphics cards and the AD104 GPU is going to be featured on the GeForce RTX 4080 12 GB graphics cards.
The Ada GPUs are based on the TSMC 4N process node which is a custom process designed exclusively for NVIDIA. It is essentially an optimized version of the N5 (5nm) process, offering drastic increases in transistors, cores, and frequency. The top AD102 GPU packs 70% more cores and also offers 76.3 Billion transistors while offering over 2x the performance per watt.
NVIDIA Ada AD102 GPU
The full AD102 GPU is made up of 12 graphics processing clusters with 12 SM units on each cluster. That makes up 144 SM units for a total of 18432 cores, 144 RT cores, 576 Tensor Cores, 576 Texture Units, and a 384-bit bus interface in a 76.3 billion transistor package measuring 608,5mm2.
NVIDIA has also introduced its 4th Generation Tensor core architecture and 3rd Generation RT cores on Ada GPUs. Now Tensor cores have been available since Volta and consumers got a taste of it with the Turing & Ampere GPUs. One of the key areas where Tensor Cores are put to use for AAA games is DLSS. There's a whole software stack that leverages from Tensor cores and that is known as the NVIDIA NGX. These software-based technologies will help enhance graphics fidelity with features such as Deep Learning Super Sampling (DLSS), AI InPainting, AI Super Rez, RTX Voice, and AI Slow-Mo.
While its initial debut was a bit flawed, DLSS in its 2nd iteration (DLSS 2.x) has done wonders to not only improve gaming performance but also image quality.
Let's dive into the technological advancements that allow these incredible achievements. To begin with, NVIDIA engineers started with DLSS Super Resolution and added something called Optical Multi Frame Generation based on Ada's Optical Flow Accelerator.
This accelerator analyzes two sequential frames from a particular game, capturing pixel details such as particles, reflections, lighting, and shadows.
On top of that, NVIDIA DLSS 3 also takes into account conventional game engine information such as motion vectors. The DLSS Frame Generation AI convolutional autoencoder network will then decide how to use each of the four inputs (current and prior frames, optical flow field, and motion vectors) to recreate intermediate frames in the best possible way.
NVIDIA DLSS 3 is said to reconstruct 3/4 of the first frame with DLSS Super Resolution and the full second frame with the help of the aforementioned DLSS Frame Generation. Overall, NVIDIA DLSS 3 reconstructs 7/8 of the two total frames displayed, which explains the massive performance uplift.
Additionally, the new version of the Deep Learning Super Sampling image reconstruction technique also includes the latency-lowering NVIDIA Reflex technology.
Cyberpunk 2077 has been shown running NVIDIA DLSS 3, the brand new Ray Tracing Overdrive, and NVIDIA Reflex with up to 4x improved performance and up to 2x reduced latency. That's not all, as NVIDIA is even promising benefits for CPU-bound games, which generally didn't run much faster with DLSS 2.0. For example, the notoriously CPU-heavy Microsoft Flight Simulator gets up to 2x improved performance with the new DLSS.
Overall, NVIDIA said the following over 35 games and apps already pledged support to NVIDIA DLSS 3.
|
|
The green company also released a performance chart on some of those games running on NVIDIA DLSS 3; check it out below.
3rd Gen RT Cores, RTX, and Real-Time Ray Tracing Dissected
Next up, we have the RT Cores, which are what will power Real-Time Raytracing. NVIDIA isn't going to distance itself from traditional rasterization-based rendering but instead follow a hybrid rendering model. The new 3rd Generation RT cores offer increased performance and offer double the ray/triangle intersection testing rate over Turing RT cores.
the Third-Generation RT Core found in Ada GPUs includes dedicated units known as the Opacity Micromap Engine and the Displaced Micro-Mesh Engine. The Opacity Micromap Engine evaluates Opacity Micromaps (represented by the triangle with foliage on the bottom left), which are used to accelerate alpha traversal. The Displaced Micro-Mesh Engine generates meshes of micro-triangles that are known as Displaced Micro-Meshes (represented by the triangle on the bottom right in the diagram below). Displaced Micro-Meshes allow the Ada RT Core to ray trace geometrically complex objects and environments with significantly less BVH build time and storage costs. Finally, ray-triangle intersection testing is 2x faster in Ada’s Third-Generation RT Core compared to the Ampere GPU generation.
NVIDIA engineers have developed three new features in the Ada RT Core to enable high-performance ray tracing of highly complex geometry:
- First, Ada’s Third-Generation RT Core features 2x Faster Ray-Triangle Intersection Throughput relative to Ampere; this enables developers to add more detail to their virtual worlds.
- Second, Ada’s RT Core has 2x Faster Alpha Traversal; the RT Core features a new Opacity Micromap Engine to directly alpha-test geometry and significantly reduce shader-based alpha computations. With this new functionality, developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and directly and more efficiently ray trace them with the Ada RT Core.
- Third, the new Ada RT Core supports 10x Faster BVH Build in 20X Less BVH Space when using its new Displaced Micro-Mesh Engine to generate micro-triangles from micro-meshes on-demand. The micro-mesh is a new primitive that represents a structured mesh of micro-triangles that the Ada RT Core processes natively, saving the storage and processing compared to what is normally required when describing complex geometries using only basic triangles.
Taken together, these three advances incorporated into the Ada RT Core enable order-of-magnitude increases in richness without commensurate increases in processing time or memory consumption.
2x Faster Ray-Triangle Intersection Testing
Ray-triangle intersection testing is a computationally expensive operation that is commonly performed when rendering a ray-traced scene. Recognizing the importance of this function, with each new RTX GPU NVIDIA engineers have strived to improve intersection testing performance and efficiency. The Third-Generation RT Core in the Ada architecture provides double the throughput for ray-triangle intersection testing over Ampere (and 4x faster than the first-generation RT Core used in Turing GPUs).
2x Faster Alpha Traversal Performance with Opacity Micromap Engine
Developers frequently use a texture’s alpha channel to economically cut out complex shapes or more generally to represent translucency. A leaf might be described using a couple of triangles, employing a texture’s alpha channel to economically capture the complex shape. A flame’s complex shape and translucency can also be approximated by alpha.
Prior to Ada’s RT Core, a developer could incorporate these kinds of content into a ray-traced scene by tagging them as not opaque. When a leaf is hit by a ray, a shader is invoked to determine how to treat the intersection, even if the ray is simply characterized as a hit or a miss. This incurs a noticeable cost. Specifically, when a warp of rays is cast towards non-opaque objects, individual ray queries may require multiple shader invocations to resolve, while other rays terminate immediately. The result is lingering live threads and commensurate inefficiency.
To efficiently handle these kinds of content, NVIDIA engineers have added an Opacity Micromap Engine to Ada’s RT Core. An opacity micromap is a virtual mesh of micro-triangles, each with an opacity state that the RT Core uses to directly resolve ray intersections with non-opaque triangles. Specifically, the barycentric coordinates of an intersection are used to address the corresponding micro-triangle’s opacity state. The opacity state may be opaque, transparent, or unknown. If opaque, then a hit is recorded and returned. If transparent, the intersection is ignored and the search for an intersection continues. If unknown, then the control is returned to the SM, invoking a shader (“anyhit”) to programmatically resolve the intersection.
The new Opacity Micromap Engine evaluates the opacity mask, which is a regular triangular mesh defined using the barycentric coordinate system used for reporting ray/triangle intersections. These meshes may be sized from one to sixteen million micro-triangles, with one or two bits associated with each micro-triangle. As a simple illustrative example, consider a detailed maple leaf described using two triangles and an alpha texture
10x Faster BVH Build in 20X Less BVH Space with Ada’s Displaced Micro-Mesh Engine
Geometric complexity continues to rise with every new generation. Ray tracing performance scales attractively with increases in scene complexity. When we ray trace complex environments, tracing costs increase slowly, a one-hundred-fold increase in geometry might only double tracing time.
However, creating the data structure (BVH) that makes that small increase in time possible requires roughly linear time and memory; 100x more geometry could mean 100x more BVH build time and 100x more memory. Ada’s Third-Generation RT Core with Displaced Micro-Meshes (DMM) helps significantly with both of the challenges of high geometric complexity - BVH builds performance and memory/storage footprint. Asset storage and transmission costs are reduced as well.
Secondary rays are generated at each primary ray hit point in the middle scene. Starting at the primary hit surfaces they shoot off in different directions, hitting different objects. Secondary hit shading tends to be less ordered and less efficient when executing on the GPU, because different shader programs are running on different threads, and often must serialize execution. Examples of secondary rays that can benefit from SER include those used for path tracing, reflections, indirect lighting, and translucency effects.
Shader Execution Reordering adds a new stage in the ray tracing pipeline which reorders and groups the secondary hit shading to have better execution locality, thus much higher overall ray-traced shading efficiency. SER can often provide up to 2X performance improvement for RT shaders in cases with a high level of divergence (such as path tracing). In testing with Cyberpunk 2077 running in RT: Overdrive Mode, we’ve measured overall performance gains of up to 44% from SER.
The Micron GDDR6X memory brings a lot of new stuff to the table. It is faster, doubles the I/O data rate, and is the first to implement PAM4 multi-level signaling in memory dies. With the Geforce RTX 3090 class products, Micron's GDDR6X memory achieves a bandwidth of up to 1 TB/s which is used to power next-generation gaming experiences at high-fidelity resolutions such as 8K.
Micron GDDR6X graphics memory doubles input/output (I/O) performance while minimizing the cost of memory. Working with AI-innovation leader NVIDIA, Micron delivers higher bandwidth by enabling multi-level signaling in the form of four-level pulse amplitude modulation (PAM4) technology in this memory device via Micron
The new GDDR6X SGRAM:
- Doubles the data rate of SGRAM at a lower power per transaction while enabling the breaking of the 1 Terabyte per second (TB/s) system memory bandwidth boundary for graphics card applications;
- Is the first discrete graphics memory device that employs PAM4-encoded signaling between the processor and the DRAM, using four voltage levels to encode and transfer two bits of data per interface clock.
- Can be designed and operated stably at high speeds and built-in mass-production.
As mentioned, GDDR6X features the brand new PAM4 multilevel signaling techniques, which help transfer data much faster, double the I/O rate, pushing the capability of each memory dies from 64 GB/s to 84 GB/s. The Micron GDDR6X memory dies are also the only graphics DRAM that can be mass-produced while featuring PAM4 signaling.
What is interesting is that Micron quotes that its GDDR6X memory can hit speeds of up to 22.4 Gbps whereas we have only got to see 21 Gbps in action on the GeForce RTX 3090 Ti. It is likely that AIBs could utilize higher binned dies as they are available. Micron does has faster chips but those aren't coming to NV 40 series graphics cards for now.
It's not just faster speeds but Micron's GDDR6X provides higher bandwidth while sipping in 15% lower power per transferred bit compared to the previous generation GDDR6 memory. PAM4 signaling is a big upgrade from the two-level NRZ signaling on the GDDR6 memory.
Instead of transmitting two binary bits of data each clock cycle (one bit on the rising edge and one bit on the falling edge of the clock), PAM4 sends two bits on each clock edge, encoded using four different voltage levels. The voltage levels are divided into 250 mV steps with each level representing two bits of data - 00, 01, 10, or 11 sent on each clock edge (still DDR technology).
Micron GDDR6X Memory
| Feature | GDDR5 | GDDR5X | GDDR6 | GDDR6X |
|---|---|---|---|---|
| Density | From 512Mb to 8Gb | 8Gb | 8Gb, 16Gb | 8Gb, 16Gb |
| VDD and VDDQ | Either 1.5V or 1.35V | 1.35V | Either 1.35V or 1.25V | Either 1.35V or 1.25V |
| VPP | N/A | 1.8V | 1.8V | 1.8V |
| Data rates | Up to 8 Gb/s | Up to 12Gb/s | Up to 16 Gb/s | 19 Gb/s, 21 Gb/s, >21 Gb/s |
| Channel count | 1 | 1 | 2 | 2 |
| Access granularity | 32 bytes | 64 bytes 2x 32 bytes in pseudo 32B mode | 2 ch x 32 bytes | 2 ch x 32 bytes |
| Burst length | 8 | 16 / 8 | 16 | 8 in PAM4 mode 16 in RDQS mode |
| Signaling | POD15/POD135 | POD135 | POD135/POD125 | PAM4 POD135/POD125 |
| Package | BGA-170 14mm x 12mm 0.8mm ball pitch | BGA-190 14mm x 12mm 0.65mm ball pitch | BGA-180 14mm x 12mm 0.75mm ball pitch | BGA-180 14mm x 12mm 0.75mm ball pitch |
| I/O width | x32/x16 | x32/x16 | 2 ch x16/x8 | 2 ch x16/x8 |
| Signal count | 61 - 40 DQ, DBI, EDC - 15 CA - 6 CK, WCK | 61 - 40 DQ, DBI, EDC - 15 CA - 6 CK, WCK | 70 or 74 - 40 DQ, DBI, EDC - 24 CA - 6 or 10 CK, WCK | 70 or 74 - 40 DQ, DBI, EDC - 24 CA - 6 or 10 CK, WCK |
| PLL, DCC | PLL | PLL | PLL, DCC | DCC |
| CRC | CRC-8 | CRC-8 | 2x CRC-8 | 2x CRC-8 |
| VREFD | External or internal per 2 bytes | Internal per byte | Internal per pin | Internal per pin 3 sub-receivers per pin |
| Equalization | N/A | RX/TX | RX/TX | RX/TX |
| VREFC | External | External or Internal | External or Internal | External or Internal |
| Self refresh (SRF) | Yes Temp. Controlled SRF | Yes Temp. Controlled SRF Hibernate SRF | Yes Temp. Controlled SRF Hibernate SRF VDDQ-off | Yes Temp. Controlled SRF Hibernate SRF VDDQ-off |
| Scan | SEN | IEEE 1149.1 (JTAG) | IEEE 1149.1 (JTAG) | IEEE 1149.1 (JTAG) |
With each new generation of graphics cards, NVIDIA delivers a new range of display technologies. This generation is no different, and we see some significant updates to the display engine and the graphics interconnect. With the adoption of faster GDDR6X memory, which provides higher bandwidth, faster compression, and more cache, gaming applications can now run at higher resolutions, supporting more details on the display.
The Ada Display Engine supports two new display technologies, HDMI 2.1 and DisplayPort 1.4a with DSC 1.2a. HDMI 2.1 allows up to 48 Gbps of total bandwidth and up to 4K 240Hz HDR and 8K 60Hz HDR.
DisplayPort 1.4a allows for up to 8K resolutions with 60Hz refresh rates and includes VESA's display stream compression 1.2 technology with visually lossless compression. You can run up to two 8K displays at 60 Hz using two cables, one for each display. In addition to that, Ampere also supports HDR processing natively with tone mapping added to the HDR pipeline.
Ada GPUs take streaming and video content to the next level, incorporating support for AV1 video encoding in the Ada eighth-generation dedicated hardware encoder (known as NVENC). Prior generation Ampere GPUs supported AV1 decoding but not encoding. Ada’s AV1 encoder is 40% more efficient than the H.264 encoder used in GeForce RTX 30 Series GPUs. AV1 will enable users who are streaming at 1080p today to increase their stream resolution to 1440p while running at the same bitrate and quality, or for users with 1080p displays, streams will look similar to 1440p, providing better quality.
Ada GPUs are also equipped with dual NVENC encoders. This enables video encoding at 8K/60 for professional video editing or four 4K/60. (Game streaming services can also take advantage of this to enable more simultaneous sessions, for instance.) Blackmagic Design’s DaVinci Resolve, the popular Voukoder plugin for Adobe Premiere Pro, and Jianying — the top video editing app in China — are all enabling AV1 support, as well as a dual encoder through encode presets. Dual encoder and AV1 availability for these apps will be available in October. NVIDIA is also working with the popular video-effects app Notch to enable AV1, as well as Topaz to enable support for AV1 and the dual encoders.
In addition to NVENC, Ada GPUs also include the fifth-generation hardware decoder that was first launched with Ampere (known as NVDEC). NVDEC supports hardware-accelerated video decoding of MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and the AV1 video formats. 8K/60 decoding is also fully supported. In the future, NVIDIA is also working to enable high-quality video production using AI.
NVIDIA RTX IO - Blazing Fast Read Speeds With GPU Utilization
As storage sizes have grown, so has storage performance. Gamers are increasingly turning to SSDs to reduce game load times: while hard drives are limited to 50-100 MB/sec throughput, the latest M.2 PCIe Gen4 SSDs deliver up to 7 GB/sec. With the traditional storage model, game data is read from the hard disk, then passed from the system memory and CPU before being passed to the GPU.
Historically games have read files from the hard disk, using the CPU to decompress the game image. Developers have used lossless compression to reduce install sizes and improve I/O performance. However, as storage performance has increased, traditional file systems and storage APIs have become a bottleneck. For example, decompressing game data from a 100 MB/sec hard drive takes only a few CPU cores, but decompressing data from a 7 GB/sec PCIe Gen4 SSD can consume more than twenty AMD Ryzen Threadripper 3960X CPU cores!
Using the traditional storage model, game decompression can consume all 24 cores on a Threadripper CPU. Modern game engines have exceeded the capability of traditional storage APIs. A new generation of I/O architecture is needed. Data transfer rates are the gray bars, CPU cores required are the black/blue blocks.
NVIDIA RTX IO is a suite of technologies that enable rapid GPU-based loading and decompression of game assets, accelerating I/O performance by up to 100x compared to hard drives and traditional storage APIs. When used with Microsoft’s new DirectStorage for Windows API, RTX IO offloads dozens of CPU cores’ worth of work to your RTX GPU, improving frame rates, enabling near-instantaneous game loading, and opening the door to a new era of large, incredibly detailed open-world games.
Object pop-in and stutter can be reduced, and high-quality textures can be streamed at incredible rates, so even if you’re speeding through a world, everything runs and looks great. In addition, with lossless compression, game download and install sizes can be reduced, allowing gamers to store more games on their SSD while also improving their performance.
NVIDIA RTX IO plugs into Microsoft’s upcoming DirectStorage API, which is a next-generation storage architecture designed specifically for state-of-the-art NVMe SSD-equipped gaming PCs and the complex workloads that modern games require. Together, streamlined and parallelized APIs specifically tailored for games allow dramatically reduced IO overhead and maximize performance/bandwidth from NVMe SSDs to your RTX IO-enabled GPU.
Specifically, NVIDIA RTX IO brings GPU-based lossless decompression, allowing reads through DirectStorage to remain compressed and delivered to the GPU for decompression. This removes the load from the CPU, moving the data from storage to the GPU in a more efficient, compressed form, and improving I/O performance by a factor of two.
GeForce RTX GPUs will deliver decompression performance beyond the limits of even Gen4 SSDs, offloading potentially dozens of CPU cores’ worth of work to ensure maximum overall system performance for next-generation games. Lossless decompression is implemented with high-performance compute kernels, asynchronously scheduled. This functionality leverages the DMA and copy engines of Turing and Ampere, as well as the advanced instruction set, and architecture of these GPU’s SM’s.
The advantage of this is that the enormous compute power of the GPU can be leveraged for burst or bulk loading (at level load, for example) when GPU resources can be leveraged as high-performance I/O processors, delivering decompression performance well beyond the limits of Gen4 NVMe. During streaming scenarios, bandwidths are a tiny fraction of the GPU capability, further leveraging the advanced asynchronous compute capabilities of Turing and Ampere. Microsoft is targeting a developer preview of DirectStorage for Windows for game developers next year, and NVIDIA Turing & Ampere gamers will be able to take advantage of RTX IO-enhanced games as soon as they become available.
The NVIDIA GeForce RTX 4090 will use 128 SMs of the 144 SMs for a total of 16,384 CUDA cores. The GPU will come packed with 96 MB of L2 cache and a total of 384 ROPs which is simply insane but considering that the RTX 4090 is a cut-down design, it may feature slightly lower L2 and ROP counts. The clock speeds are not confirmed yet but considering that the TSMC 4N process is being used. The clock speeds are rated at up to 2.6 GHz and NVIDIA is claiming over 3 GHz speeds with overclocking which you can read more about here.
As for memory specs, the GeForce RTX 4090 will feature 24 GB GDDR6X capacities that will be clocked at 21 Gbps speeds across a 384-bit bus interface. This will provide up to 1 TB/s of bandwidth. This is the same bandwidth as the existing RTX 3090 Ti graphics card and as far as the power consumption is concerned, the TBP is rated at 450W. The card will be powered by a single 16-pin connector which delivers up to 600W of power. Custom models will be offering higher TBP targets.
NVIDIA GeForce RTX 4090 Graphics Cards Performance
As for the performance of these monster GPUs, NVIDIA shared the computational and gaming performance figures and it looks like the GeForce RTX 4090 will be the first gaming card to hit the 100 TFLOPs compute horsepower limit.
Just for comparison's sake:
- NVIDIA GeForce RTX 4090: 90 TFLOPs (FP32) (Assuming 2.8 GHz clock)
- NVIDIA GeForce RTX 3090 Ti: 40 TFLOPs (FP32) (1.86 GHz Boost clock)
- NVIDIA GeForce RTX 3090: 36 TFLOPs (FP32) (1.69 GHz Boost clock)
Based on a theoretical clock speed of 2.8 GHz, you get up to 103 TFLOPs of compute performance and the rumors are suggesting even higher boost clocks. Now, these are definitely sounding like peak clocks, similar to AMD's peak frequencies which are higher than the average 'Game' clock. A 100+ TFLOPs compute performance means more than double the horsepower versus the 3090 Ti flagship. One should remember that compute performance doesn't necessarily indicate the overall gaming performance. Even so, it will be a huge upgrade for gaming PCs and an 8.5x increase over the current fastest console, the Xbox Series X.
FP32 Compute Horsepower Comparisons (Higher is Better)
This will be a 2x compute performance uplift and a 2x gain in gaming performance as NVIDIA has demonstrated for each graphics card versus its predecessor and this is without even factoring in the RT and Tensor core performance which are expected to get major lifts too in their respective department. A 2-4x gain over the RTX 3090 & RTX 3090 Ti would be very disruptive.
Gamers should expect 4K gaming to be buttery smooth on these graphics cards and with DLSS, we might even see playable 60 FPS at 8K resolution which is something that NVIDIA has been trying to achieve with its RTX 3090 series BFGPUs for a while now.
NVIDIA GeForce RTX 4090 Graphics Cards Price & Availability
Now coming to the prices, the NVIDIA GeForce RTX 3090 Ti & RTX 3090 graphics cards are undoubtedly the most expensive single-chip GPUs to date. The NVIDIA GeForce RTX 4090 will come at a price of $1599 US for the Founders Edition variant and custom models starting today.
NVIDIA GeForce RTX 40 Series Official Specs:
| Graphics Card Name | NVIDIA GeForce RTX 4090 | NVIDIA GeForce RTX 4090 D | NVIDIA GeForce RTX 4080 | NVIDIA GeForce RTX 4070 Ti | NVIDIA GeForce RTX 4070 | NVIDIA GeForce RTX 4060 Ti | NVIDIA GeForce RTX 4060 |
|---|---|---|---|---|---|---|---|
| GPU Name | Ada Lovelace AD102-300 | Ada Lovelace AD102-250 | Ada Lovelace AD103-300 | Ada Lovelace AD104-400 | Ada Lovelace AD104-250 | Ada Lovelace AD106-350 | Ada Lovelace AD107-400 |
| Process Node | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N |
| Die Size | 608mm2 | 608mm2 | 378.6mm2 | 294.5mm2 | 294.5mm2 | 190.0mm2 | 146.0mm2 |
| Transistors | 76 Billion | 76 Billion | 45.9 Billion | 35.8 Billion | 35.8 Billion | 22.9 Billion | TBD |
| CUDA Cores | 16384 | 14592 | 9728 | 7680 | 5888 | 4352 | 3072 |
| TMUs / ROPs | 512 / 176 | TBD | 320 / 112 | 240 / 80 | 184 / 64 | 136 / 48 | TBD |
| Tensor / RT Cores | 512 / 128 | 456 / 128 | 304 / 76 | 240 / 60 | 184 / 46 | 136 / 34 | TBD |
| L2 Cache | 72 MB | 72 MB | 64 MB | 48 MB | 36 MB | 32 MB | 24 MB |
| Base Clock | 2230 MHz | 2280 MHz | 2210 MHz | 2310 MHz | 1920 MHz | 2310 MHz | 1830 MHz |
| Boost Clock | 2520 MHz | 2520 MHz | 2510 MHz | 2610 MHz | 2475 MHz | 2535 MHz | 2460 MHz |
| FP32 Compute | 83 TFLOPs | TBD | 49 TFLOPs | 40 TFLOPs | 29 TFLOPs | 22 TFLOPs | 15 TFLOPs |
| RT TFLOPs | 191 TFLOPs | TBD | 113 TFLOPs | 82 TFLOPs | 67 TFLOPs | 51 TFLOPs | 35 TFLOPs |
| Tensor-TOPs | 1321 TOPs | TBD | 780 TOPs | 641 TOPs | 466 TOPs | 353 TOPs | 242 TOPs |
| Memory Capacity | 24 GB GDDR6X | 24 GB GDDR6X | 16 GB GDDR6X | 12 GB GDDR6X | 12 GB GDDR6X | 8-16 GB GDDR6 | 8 GB GDDR6 |
| Memory Bus | 384-bit | 384-bit | 256-bit | 192-bit | 192-bit | 128-bit | 128-bit |
| Memory Speed | 21.0 Gbps | 21.0 Gbps | 23.0 Gbps | 21.0 Gbps | 21.0 Gbps | 18.0 Gbps | 17.0 Gbps |
| Bandwidth | 1008 GB/s | 1008 GB/s | 736 GB/s | 504 GB/s | 504 GB/s | 288 GB/s (554 GB/s Effective) | 272 GB/s (453 GB/s Effective) |
| TBP | 450W | 425W | 320W | 285W | 200W | 160-165W | 115W |
| Price (MSRP / FE) | $1599 US / 1949 EU | 12,999 RMB (China-Only) | $1199 US / 1469 EU | $799 US | $599 US | $399-$499 US | $299 US |
| Price (Current) | $1599 US / 1859 EU | 12,999 RMB (China-Only) | $1199 US / 1399 EU | $799 US | $599 US | $399-$499 US | $299 US |
| Launch (Availability) | 12th October 2022 | 28th December 2023 | 16th November 2022 | 5th January 2023 | 13th April 2023 | 24th May / 18th July 2023 | 29th June 2023 |
The MSI GeForce RTX 4090 SUPRIM X graphics card comes inside a standard cardboard box. The front of the package has a large "GeForce RTX" brand logo along with the "MSI" logo in the top left corner and the "SUPRIM X" series branding in the lower-left corner. A large picture of the graphics card itself is depicted on the front which gives a nice preview of the SUPRIM X design.
The packaging has put a large emphasis on the RTX side of things as the first feature enlisted by AIBs will be NVIDIA Ada architecture, Ray Tracing & DLSS support. NVIDIA has bet the future of their gaming GPUs on Ray Tracing support as these are the first cards to offer support for the new feature.
The back of the box is very typical, highlighting the main features and specifications of the cards. The three key aspects of MSI's top-tier custom cards are its blazing performance which is achieved by fully custom design, the new Tri-Frozr 3S cooling system, and a new Torx Fan 5.0 fan and Vapor Chamber cooler which will offer better cooling performance.
There's also a focus towards GeForce.com on each AIB card through which users can download the latest drivers and GeForce Experience application which are a must for gamers to access all feature sets of the new cards.
The sides of the box once again greet us with the large GeForce RTX branding. There's also the mention of 24 GB GDDR6X (RTX 4090) memory available on the card. Opening the box, you are greeted with a nice SUPRIM logo.
Outside of the box, the graphics card and the accessory package are held firmly by foam packaging. The graphics card comes with a few accessories and manuals which might not be of much use for hardcore enthusiasts but can be useful for the mainstream gaming audience. The only two useful accessories are the GPU mounting anti-sag bar and the 16-pin to 4x 8-pin power adapter. There's also a nice mousepad that MSI ships with its SUPRIM series lineup.
The card is nicely wrapped within an anti-static cover which is useful to prevent any unwanted static discharges on various surfaces that might harm the graphics card. The most interesting accessory that I found in the package was a graphics card support bracket. This bracket connects the graphics card to the casing, offering better durability and preventing any sort of bending that may occur due to the heavy weight of the Gaming X Trio series graphics cards.
After the package is taken care of, I can finally start talking about the card itself. This thing is a beast and I can't wait to test it out to find what kind of performance improvement I get over current-gen cards.
MSI’s Tri Frozr heat sinks are some of the biggest heatsink cooling solutions that I have ever tested. I first tested the Gaming X Trio when MSI released the 1080 Ti variant back in 2017 and that was a very aggressive design in its own right. Since then, I have tested the RTX 2080 Ti, RTX 3090, RTX 3090 Ti in their Tri-Frozr iterations. With the RTX 40 series cards, MSI has further refined the Tri Frozr design. The card measures the same at 336 x 142 x 78 mm and weighs in at 2413 grams, respectively. The card features a standard 3.5 slot height which is expected of today's high-end cards.
You would have to keep in mind the height when going for a triple or quad-slot card solution as your case or motherboard PCIe slot combination may not allow such a setup. The cooling shroud extends all the way to the back of the PCB and it requires a casing with good interior space for proper installation.
The back of the card features a solid backplate that looks stunning. The backplate offers a lot more functionality than just looks which I will get back to in a bit.
In terms of design, we are looking at an updated version of the Tri Frozr heatsink known as Tri Frozr 3S which is now in its eighth variation while for the SUPRIM X series, this is the 2nd iteration. The first variation started off with the GTX 780 Ti Lightning, the second was the 980 Ti Lightning, then came the 1080 Ti Gaming X Trio, the 1080 Ti Lightning, then the RTX 20 & RTX SUPER Gaming X Trio graphics cards while the seventh generation was introduced on the RTX 30 series. Now we are in the eighth generation.
The new heatsink looks like a beefed-up version of the SUPRIM X heatsink that we saw on the 3090 Ti with the main changes being the shroud and heatsink design that feature a neater shroud design on the front, absorbing the black and silver color platelets while featuring the RGB emitting V-shaped acrylic cutouts at the front. The sides also come with a large RGB accent bar which lights up when the card is powered on.
Coming to the fans, the card actually features the latest fan designs based on the Torx 5.0 system. All three fans feature a ring-based design to allow for higher airflow to be channeled within the main heatsink. All fans deploy a double ball bearing design and can last a long time while operating silently. Each fan has three blades that form three sets and each fan has three sets of them that make up a total of 9 fan blades. Each blade is tilted at a 22 degrees angle to the main high-pressure airflow.
MSI also features its Zero Frozr technology on the Tri Frozr heatsink. This feature won’t spin the fans on the card unless they reach a certain threshold. If you notice closely, you can see that the card features beveled edges that are polished several times with a diamond-tipped cutter to achieve a mirror finish and that can give a slight gold effect which looks great.
In the case of the Tri Frozr heatsink, that limit is set to 60C. If the card is operating under 60C, the fans won’t spin which means no extra noise would be generated.
I am back at talking about the full-coverage, full metal-based backplate that the card uses. The whole plate is made of solid metal with rounded edges that add to the durability of this card. The brushed matte-black finish on the backplate gives a unique aesthetic. The graphics card also comes with a compact PCB design which means that the shroud, heatsink, and backplate are all extended beyond the PCB. The third fan blows air through the heatsink and blows it out from the cutouts that are situated at the very end of the backplate.
There are cutouts in screw placements to easily reach the points on the graphics card. We can also see the mew SUPRIM logo which drops the Dragon design and goes for a Diamond shape on the back which looks stunning. MSI is also using heat pads beneath the backplate which offer more cooling to the electrical circuitry on the PCB. The most interesting thing to spot on the back aside from the backplate is the large retention metal bracket which adds more mounting pressure to effectively disperse heat from the GPU to the heatsink.
With the outside of the card done, I will now start taking a glance at what's beneath the hood of these monster graphics cards. The first thing to catch my eye is the humungous fin stack that's part of the beefy heatsink that the cards utilize.
The large fin stack runs all the way from the front and to the back of the PCB and is so thick that you can barely see through it. It also comes with the wave-curved 3.0 fin stack design which I want to shed some light on as it is a turn away from traditional fin design and one that actually offers better cooling on high-end graphics cards such as the RTX 3090 Ti. The card also uses antegrade fins on the back that direct and optimize air pass through on the back, allowing more warm air to pass out of the card like a nozzle.
The heatsink has been designed to be denser by using a wave-curved and filled-fin design. It allows more air to pass through the fins smoothly, without causing any turbulence that would result in unwanted noise. Airflow Control Technology guides the airflow directly onto the heat pipes, while simultaneously creating more surface area for the air to absorb more heat before leaving the heatsink. The heat pipes have also been arranged in a way that allows MSI to stack even more fin room.
Talking about the heatsink, the massive block is comprised of 11 copper squared-shaped heat pipes with a more concentrated design to transfer heat from the copper base to the heatsink more effectively. The base itself is a solid nickel-plated base plate, transferring heat to the heat pipes in a very effective manner. To top it all off,
MSI adds extra protection to its impressive PCB by including a rugged anti-bending plate. This also acts as a memory and MOSFET cooling plate while the PWM heatsink with micro fins keeps the VRM cool under stressful conditions.
I/O on the graphics card sticks with the reference scheme which includes three Display Port 1.4a & a single HDMI 2.1 port.
There's also a dual-BIOS switch on the card which comes pre-configured with Silent & Gaming modes. The BIOS doesn't affect the clock profiles but rather affects the maximum power limit, enabling higher fan speeds for better cooling and more stable clocks. The limits are 450W for the silent and 480W (550W power limit) for the gaming profile.
MSI GeForce RTX 4090 SUPRIM X Teardown:
MSI makes use of a 26+2 phase PWM design that is made up of high-quality components such as HCI or High-Efficiency Carbony Inductors, SPS (Smart Power Stages), and hardened defense fuse.
The card also uses the latest GDDR6X DRAM from Micron which operates at 21.0 Gbps alongside a 384-bit wide memory interface.
The MSI GeForce RTX 4090 SUPRIM X is a very power-hungry graphics card as showcased by its custom design. Being so, the card utilizes a single 16-pin connector which can deliver up to 550 Watts of power to the graphics card. The card is rated at 480W but ends up around 550W with its full power limit.
MSI GeForce RTX 4090 SUPRIM X RGB Lighting Gallery:
MSI SUPRIM X series cards utilize their Mystic Light RGB technology to offer you a visually pleasing lighting experience on your graphics cards.
There are a total of 5 different RGB effects that you can choose from and the cards have 3 RGB accent points on the front, one on the back, and one lightbar surrounding the side of the card which looks really good. You can fully customize the RGB lights to your preference using the MSI Mystic Light application from MSI's web page.
Following is what the graphics card looks like when lit up.
The MSI GeForce RTX 4090 SUPRIM Liquid X graphics card comes inside a large cardboard box. The front of the package has a large "GeForce RTX" brand logo along with the "MSI" logo in the top left corner and the "SUPRIM Liquid X" series branding in the lower-left corner.
A large picture of the graphics card itself is depicted on the front which gives a nice preview of the SUPRIM Liquid X design. This is MSI's first AIO variant featured within the SUPRIM lineup. The last generation did have AIO cards but those were branded under the Sea Hawk lineup.
The packaging has put a large emphasis on the RTX side of things as the first feature enlisted by AIBs will be NVIDIA Ada architecture, Ray Tracing & DLSS support. NVIDIA has bet the future of their gaming GPUs on Ray Tracing support as these are the first cards to offer support for the new feature.
The back of the box is very typical, highlighting the main features and specifications of the cards. The three key aspects of MSI's top-tier custom cards are its blazing performance which is achieved by fully custom design, the new Micro-fin copper base, Torx Fan 5.0, & an aluminum heat radiator which will offer better cooling performance.
The back of the box is very typical, highlighting the main features and specifications of the cards. The three key aspects of MSI's top-tier custom cards are its blazing performance which is achieved by fully custom design, the new Tri-Frozr cooling system, and a new wave-curved 2.0 heatsink which will offer better cooling performance compared to the traditional flat-surfaced fin heatsinks.
There's also a focus towards GeForce.com on each AIB card through which users can download the latest drivers and GeForce Experience application which are a must for gamers to access all feature sets of the new cards.
The sides of the box once again greet us with the large GeForce RTX branding. There's also the mention of 24 GB GDDR6X (RTX 4090) memory available on the card. Opening the box, you are greeted with a nice SUPRIM logo.
Outside of the box, the graphics card and the accessory package are held firmly by foam packaging. The graphics card comes with a few accessories and manuals which might not be of much use for hardcore enthusiasts but can be useful for the mainstream gaming audience. The only accessory bundled with the graphics card is a single 16-pin to 4x 8-pin power adapter. There's also a nice mousepad that MSI ships with its SUPRIM series lineup.
The card is nicely wrapped within an anti-static cover which is useful to prevent any unwanted static discharges on various surfaces that might harm the graphics card. The most interesting accessory that I found in the package was a graphics card support bracket. This bracket connects the graphics card to the casing, offering better durability and preventing any sort of bending that may occur due to the heavy weight of the Gaming X Trio series graphics cards.
After the package is taken care of, I can finally start talking about the card itself. Compared to the SUPRIM X Air, the SUPRIM Liquid X has a more elegant design with a brushed aluminum shroud. Like the SUPRIM X, the SUPRIM Liquid X was also sent with custom nameplates designed for us which look absolutely great.
Unlike the massive monstrosities that the air coolers are in, the RTX 4090 SUPRIM Liquid X comes in a very attractive 2-slot design. The card is a semi-hybrid design with a pump that leads to a 240mm AIO radiator and a singular Torx 5.0 fan with axial tech that further helps to dissipate heat. The RTX 4090 SUPRIM Liquid X measures 280x140x43mm while the radiator (with fans installed) measures 274x121x55mm. The whole thing weighs in at just 2353g.
You would have to keep in mind the height when going for a dual-card solution as your case or motherboard PCIe slot combination may not allow such a setup. Since this is a liquid-cooled GPU, the card requires a decent amount of room within the chassis to fit the radiator. I used the Cooler Master C700M which is a full tower casing to check if the cables have enough length to reach the top and front of the chassis. I can confidently say that you won't run into issues with the length of the cable. For reference, they measure 470mm.
The back of the card features a solid backplate that looks stunning. The backplate offers a lot more functionality than just looks which I will get back to in a bit.
The MSI SUPRIM Liquid X design takes the design philosophy of the SUPRIM lineup and further refines it into something elegant. Sure the SUPRIM X is a beast on its own and looks great but the Liquid X is more visually appealing with its more simplistic look.
MSI is using high-quality materials to design the shroud which makes use of an all-aluminum metal cover, angle-beveled edges with a light-gold color polish, and an octagon cut-out for the fan with an illuminated chevron on the back. It is honestly a well-made card.
Coming to the fans, the card actually features the latest fan designs based on the Torx 5.0 system. The singular fan features a ring-based design to allow for higher airflow to be channeled within the main heatsink. The fan deploys a double ball bearing design and can last a long time while operating silently. The fan body has three blades that form three sets and each fan has three sets of them that make up a total of 9 fan blades. Each blade is tilted at a 22 degrees angle to the main high-pressure airflow.
MSI also features its Zero Frozr technology on the Tri Frozr heatsink. This feature won’t spin the fans on the card unless they reach a certain threshold. If you notice closely, you can see that the card features beveled edges that are polished several times with a diamond-tipped cutter to achieve a mirror finish and that can give a slight gold effect which looks great.
In the case of the Tri Frozr heatsink, that limit is set to 60C. If the card is operating under 60C, the fans won’t spin which means no extra noise would be generated.
I am back at talking about the full-coverage, full metal-based backplate that the card uses. The whole plate is made of solid metal with rounded edges that add to the durability of this card. The brushed matte-black finish on the backplate gives a unique aesthetic. The graphics card also comes with a compact PCB design which means that the shroud, heatsink, and backplate are all extended beyond the PCB. The third fan blows air through the heatsink and blows it out from the cutouts that are situated at the very end of the backplate.
There are cutouts in screw placements to easily reach the points on the graphics card. We can also see the new SUPRIM logo which drops the Dragon design and goes for a Diamond shape on the back which looks stunning. MSI says that this is their new SUPRIM logo and is inspired by diamond crystals and geometry to represent the high-quality materials and construction of SUPRIM graphics cards.
MSI is also using heat pads beneath the backplate which offer more cooling to the electrical circuitry on the PCB. The most interesting thing to spot on the back aside from the backplate is the large retention metal bracket which adds more mounting pressure to effectively disperse heat from the GPU to the heatsink.
With the outside of the card done, I will now start taking a glance at what's beneath the hood of the SUPRIM Liquid X. Unlike the SUPRIM X, the Liquid X is a dual-slot design and as such, it doesn't use a huge aluminum fin block. It is also a liquid-cooled variant and as such, it has a fully enclosed body with just two tubes leading into the central chamber.
Underneath the shroud is the latest pump that MSI will be using with their SUPRIM Liquid X coolers.
The pump makes use of a micro-fin copper base which has a large central GPU contact surface and four exterior arms with a thicker surface to make direct contact with the 12 memory modules that are featured on the RTX 4090. Each arm has thermal pads featured on it to make sure that the heat is dissipated effectively.
The whole build quality of the copper block looks great and should be enough to handle the 530W power limit that the card reaches its max.
MSI adds extra protection to its impressive PCB by including a rugged anti-bending plate. This also acts as additional memory and MOSFET cooling plate while the PWM heatsink with micro fins keeps the VRM cool under stressful conditions. The 240mm radiator comes with MSI's latest MEG Silent GALE P12 fans which I have already tested with MSI's MEG S360 Liquid cooler and they are some of the quietest fans in the market right now.
I/O on the graphics card sticks with the reference scheme which includes three Display Port 1.4a & a single HDMI 2.1 port.
There's also a dual-BIOS switch on the card which comes pre-configured with Silent & Gaming modes. The BIOS doesn't affect the clock profiles but rather affects the maximum power limit, enabling higher fan speeds for better cooling and more stable clocks. The limits are 450W for the silent and 480W (with 530W Power Limit) for the gaming profile.
MSI GeForce RTX 4090 SUPRIM Liquid X Teardown:
MSI makes use of a 26+2 phase PWM design that is made up of high-quality components such as HCI or High-Efficiency Carbony Inductors, SPS (Smart Power Stages), and hardened defense fuse.
The card also uses the latest GDDR6X DRAM from Micron which operates at 21.0 Gbps alongside a 384-bit wide memory interface.
The MSI GeForce RTX 4090 SUPRIM Liquid X is a very power-hungry graphics card as showcased by its custom design. Being so, the card utilizes a single 16-pin connector which can deliver up to 600Watts of power to the graphics card. The card is rated at 480W but ends up around 530W with its full power limit.
MSI GeForce RTX 4090 SUPRIM Liquid X Mystic RGB Lighting Gallery:
MSI SUPRIM X series cards utilize their Mystic Light RGB technology to offer you a visually pleasing lighting experience on your graphics cards.
There are a total of 5 different RGB effects that you can choose from and the cards have 3 RGB accent points on the front, one on the back, and one lightbar surrounding the side of the card which looks really good. You can fully customize the RGB lights to your preference using the MSI Mystic Light application from MSI's web page. Following is what the graphics card looks like when lit up.
We used the following test system for comparison between the different graphics cards. The latest drivers that were available at the time of testing were used by AMD and NVIDIA on an updated version of Windows 11. All tested games were patched to the latest version for better performance optimization for NVIDIA and AMD GPUs.
NVIDIA GeForce RTX 4090 Test Setup
| CPU | Intel Core i9-12900K @ 5.0 GHz |
|---|---|
| Motherboard | AORUS Z690 Master (DDR5) |
| Video Cards | Colorful GeForce RTX 4090 Vulcan OC-V MSI GeForce RTX 4090 SUPRIM Liquid X MSI GeForce RTX 4090 SUPRIM X NVIDIA GeForce RTX 4090 FE NVIDIA GeForce RTX 3090 FE NVIDIA GeForce RTX 3080 Ti FE NVIDIA GeForce RTX 3080 FE MSI Radeon RX 6950 XT Gaming X Trio MSI GeForce RTX 3090 Ti SUPRIM X MSI GeForce RTX 3090 SUPRIM X, MSI Radeon RX 6900 XT Gaming Z Trio MSI GeForce RTX 3080 Ti SUPRIM X MSI Radeon RX 6800 XT Gaming X Trio MSI GeForce RTX 3080 SUPRIM X MSI GeForce RTX 3070 Ti SUPRIM X MSI GeForce RTX 2080 Ti Lightning MSI GeForce RTX 3070 Gaming X Trio |
| Memory | G.SKILL Trident Z5 RGB Series 32GB (2 X 16GB) CL36 6000 MHz |
| Storage | Teamgroup T-Force A440 Pro 2 TB Gen 4 |
| Power Supply | ASUS ROG THOR 1200W PSU |
| OS | Windows 11 64-bit |
| Drivers | AMD Radeon Adrenalin Edition 22.9.2 NVIDIA GeForce 521.90 WHQL |
- All games were tested at 3840x2160 (4K) resolution.
- Image Quality and graphics configurations are provided with each game description.
- The "reference" cards are the stock configs except where mentioned otherwise.
The MSI GeForce RTX 4090 SUPRIM X has a maximum power limit of 550W whereas the SUPRIM Liquid X has a power limit of 530W:
For overclocking, we cranked up everything! Even the fan a 100% as shown in the MSI afterburner picture below:
Firestrike
Firestrike is running the DX11 API and is still a good measure of GPU scaling performance. In this test, we ran the Extreme and Ultra versions of Firestrike which runs at 1440p and 4K and we recorded the Graphics Score only since the Physics and combined are not pertinent to this review.
3DMark Firestrike Extreme Graphics
3DMark Firestrike Ultra Graphics
Time Spy
Time Spy is running the DX12 API and we used it in the same manner as Firestrike Extreme where we only recorded the Graphics Score as the Physics score is recording the CPU performance and isn't important to the testing we are doing here.
3DMark Time Spy Graphics
3DMark Time Spy Extreme Graphics
Port Royal
Port Royal is another great tool in the 3DMark suite, but this one is 100% targeting Ray Tracing performance. It loads up ray-traced shadows, reflections, and global illumination to really tax the performance of the graphics cards that either has hardware-based or software-based ray-tracing support.
3DMark Port Royal Score
3DMark Pure Ray Tracing Feature Test
Crysis Remastered (DXVK RT)
Crysis is back with a vengeance to reclaim its title of the graphics crown. The remastered version of the game uses DX11 API but has Vulkan extensions on top which enable Vulkan Ray tracing. That's also something that the original game didn't offer. DXVK, along with improved textures and visual effects, leads to higher performance demand making us question once again "Can It Run Crysis?"
Crysis Remastered (4K Native RT SMAA2TX)
Doom Eternal
DOOM Eternal brings hell to earth with the Vulkan-powered idTech 7. We test this game using the Ultra Nightmare Preset and follow our in-game benchmarking to stay as consistent as possible.
DOOM Eternal
Red Dead Redemption 2
Developed by Rockstar San Diego, Red Dead Redemption 2 is one of the most visually stunning open-world games I've played to date that is backed up by a rich story set around the protagonist, Arthur Morgan. The game is based on the RAGE engine which features an insane amount of graphics fidelity but also requires a lot of power to run maxed out. For the purpose of this test, we set the graphics settings to Ultra with AA turned disabled.
Red Dead Redemption 2
Wolfenstein: Youngblood
Wolfenstein is back in The New Colossus and features the most fast-paced, gory, and brutal FPS action ever! The game once again puts us back in the Nazi-controlled world as BJ Blazkowicz. Set during an alternate future where Nazis won the World War, the game shows that it can be fun and can be brutal to the player and to the enemy too. Powering the new title is, once again, id Tech 6 which is much acclaimed after the success that DOOM has become. In a way, ID has regained its glorious FPS roots and is slaying with every new title.
Wolfenstein
Battlefield V
Battlefield V brings back the action of the World War 2 shooter genre. Using the latest Frostbite tech, the game does a good job of looking gorgeous in all ways possible. From the open-world environments to the intense and gun-blazing action, this multiplayer and single-player FPS title is one of the best-looking Battlefields to date.
Battlefield V
Battlefield V Raytracing DLSS (Quality)
Cyberpunk 2077
Cyberpunk 2077 is an action role-playing video game developed by CD Projekt Red and published by CD Projekt. The story takes place in Night City, an open world set in the Cyberpunk universe. Players assume the first-person perspective of a customizable mercenary known as V, who can acquire skills in hacking and machinery with options for melee and ranged combat. The game uses CD Projekt Red's in-house Red Engine which is one of the most visually breathtaking and also one of the most graphics-intensive engines designed to date.
Cyberpunk 2077 (4K Native RT)
Death Stranding
Sam Porter Bridges has delivered one of PS4's most anticipated games to the PC community and opened a whole new world of possibilities. This was the first game to feature the Decima Engine on PC and unarguably did it the best. Death Stranding may not feature ray tracing effects, but it does showcase that DLSS can be used effectively even when RT isn't around. We tested this one just like we did in our launch coverage with DLSS enabled.
Death Stranding DLSS/FSR (Quality)
Forza Horizon 5
Forza Horizon 5 carries on the open-world racing tradition of the Horizon series. The latest DX12-powered entry is beautifully crafted, amazingly well executed, and a great showcase of DX12 games. We use the benchmark run while having all of the settings set to non-dynamic with an uncapped framerate to gather these results.
Forza Horizon 5
Halo Infinite (DX12 Highest)
Next up, we have the latest entry in the Halo franchise, Halo: Infinite, which uses the brand new Slipspace engine (although there are rumors it will be ditched in the future for Unreal Engine) based on the DX12 API. The game rocks some incredible environments for Master Chief to visit on the Halo ring.
Halo Infinite
Hitman III (DX12 Highest Settings)
Hitman III is the highly acclaimed sequel to the 2016 Hitman & 2018 Hitman II, which was a redesign and reimaging of the game from the ground up. With a focus on stealth gameplay through various missions, the game once again lets you play as Agent 47. The game runs on the IO Interactive Glacier 2 engine which has been updated to deliver amazing visuals and environments on each level while making use of DirectX 12 API.
Hitman III
Shadow of The Tomb Raider
The sequel to Rise of the Tomb Raider, Shadow of The Tomb Raider is visually enhanced with an updated Foundation Engine that delivers realistic facial animations and the most gorgeous environments ever seen in a Tomb Raider Game. The game is a technical marvel and really shows the power of its graphics engine in the latest title.
Shadow of The Tomb Raider
Shadow of The Tomb Raider Raytracing DLSS/FSR (Quality)
Metro Exodus
Metro Exodus continues Artyom's journey through Russia's nuclear wasteland and its surroundings. This time, you are set over the Metro, going through various regions and different environments. The game is one of the premier titles to feature NVIDIA’s RTX technology and does well in showcasing the ray-tracing effects in all corners.
Metro Exodus Extreme Preset
Metro Exodus Raytracing DLSS (Quality)
Resident Evil Village
Resident Evil Village is the latest in the horror franchise that was wonderfully rekindled with RE7 and onto the RE2 Remake. But now the RE Engine is back and better than ever with Ray Traced Reflections and Lighting that makes the world just come to life, unironically. The game was tested in the center of the village itself with all graphical settings maxed out and with raytracing enabled.
Resident Evil Village (Maxed)
Resident Evil Village Raytracing FSR (Quality)
Stray (That Cat Game)
Stray is a 2022 adventure game developed by BlueTwelve Studio and published by Annapurna Interactive. The story follows a stray cat who falls into a walled city populated by robots, machines, and mutant bacteria, and sets out to return to the surface with the help of a drone companion, B-12. The game uses Unreal Engine 4, but DX12 Ray tracing can be enabled by adding the "-dx12" extension to the game.
Stray (Maxed With DXR)
No graphics card review is complete without evaluating its temperatures and thermal load. NVIDIA uses an updated vapor chamber and fan design on the brand-new Founders Edition variant that offers a 10% large fan and fin volume while offering up to 15% higher airflow.
Temperatures
I compiled the power consumption results by testing each card under idle and full stress when the card was running games. Each graphics card manufacturer sets a default TDP for the card which can vary from vendor to vendor depending on the extra clocks or board features they plugin on their custom cards. Default TDP for the GeForce RTX 4090 Founders Edition is rated at 450W and the peak power limit is rated at 600W.
Power Consumption
The road to Ada was sure an exciting one. We got to see various rumors, leaks, and speculation & now we finally have the final product in our hands. There was sure a lot of hype surrounding the RTX 40 series cards and we will see whether the flagship card lives up to the expectations or not.
How much more performance will the custom models get me?
One of the main questions that one would ask is about the net performance upgrade that a custom model will offer over the Founders Edition. Considering that the SUPRIM X and SUPRIM Liquid X are coming in at a premium, one should expect better performance and with a slight increase in clocks and lots of engineering put into the coolers, you can see anywhere from 5-8% higher performance with a custom model but performance is just one part of the equation. There's also the matter of thermals, the power numbers, and the overall overclocking uplift that these cards can deliver & also sustain.
Both cards are able to hit over 3 GHz with ease. The SUPRIM Liquid can hit that number more often due to its lower temps opening up more space but at the end, with its full unlocked power limit, the 4090 SUPRIM X can hit up to 3150 MHz which is a superb clock speed and something that last-gen cards could only achieve on LN2 cooling.
Air-Cooler or Liquid Cooler, Which to Get?
Both cards have their market and while I am personally an air-cooler guy, I would say MSI has built an absolute stunner of a card in the form of their MSI SUPRIM Liquid X. The SUPRIM X averaged in at around 58-60C in gaming while the SUPRIM Liquid X averaged at around 48-50C. The Liquid X is coming at a lower power limit of 530W which should take away some of its overclocking potential versus the SUPRIM X. The SUPRIM X takes things up a notch with its 550W power limit & really gives the 4090 FE a punch but also runs at a much warmer thermal load of around 70C.
But I would say that the SUPRIM X is a much bulkier card that weighs over 2Kg and also comes in at a 3.8-slot design. The SUPRIM Liquid X is more manageable for something like a Mid-tower or even a mini-ITX setup if you have enough space for an extra 240mm radiator or use an air cooler for the CPU. Going the 240mm radiator route makes it easier for compatibility versus a 360mm rad which may not fit in most Mini-ITX chassis. Furthermore, having your GPU liquid-cooled will mean that you won't have to worry about airflow much in a compact PC. And those MSI GALE P12 fans really work like wonder, delivering some of the quietest load operations even at 100% load.
Worth Paying An Extra $100-$150 Over The GeForce RTX 4090 FE?
The MSI GeForce RTX 4090 SUPRIM Liquid X is currently listed for $1749.99 US while the MSI GeForce RTX 4090 SUPRIM X is listed for $1699.99 US. That's $150 and $100 US over the RTX 4090 Founders Edition, respectively. Whether this is worth it depends on your use case. As I have stated, there are only three or four liquid-cooled variants of the 4090 out there and SUPRIM X is the only one with a 240mm radiator. If you are limited in terms of size within your PC and only have room for a 240mm rad, then this is the only card that you can get. The other 360mm variants do come in at a higher premium of $1899-$1999 US.
The MSI SUPRIM X on the other hand is a juggernaut with a massive cooler and lots of overclocking potential. It delivers amazing thermal performance and the acoustics are also better than the RTX 4090 FE which is a major plus. There's no evident coil whine on either card and they seem to clock pretty much the same except the SUPRIM X ends up being victorious in all cases by a few percentage points. If it's all about aesthetics then the 4090 Founders Edition is a looker, but if you want to go the meaner route, then 4090 SUPRIM X is well worth the price. For small form factor use cases, the 4090 SUPRIM Liquid X is a fantastic option.
Conclusion - Custom GeForce RTX 4090 Models Are Well Worth The Price
MSI has updated its SUPRIM lineup with killer new products to mark the launch of NVIDIA's RTX 40 series family. The SUPRIM Liquid X is without a doubt one of the best look custom designs, adding liquid cooling to the mix and delivering under 50C temps while gaming in a compact design while the SUPRIM X breaks ground with a humongous heatsink and a powerful PCB that is meant to rip apart any benchmark that you throw at it.
Contents
Follow Wccftech on Google to get more of our news coverage in your feeds.
