NVIDIA GeForce RTX 40 Series - GeForce RTX 4080
Keeping their tradition alive of launching a new graphics architecture every two years, this year, NVIDIA introduces its Ada Lovelace GPU. The Ada GPU is built upon the foundation set by Turing. NVIDIA terms the Ada Lovelace GPUs as a quantum leap over Ampere, and the GeForce RTX 4090 Founders Edition based on NVIDIA Ampere GPU excels at everything versus the previous gen.
The Ada GPU architecture has a lot to be talked about in this review, but so does the new RTX lineup. The Ada lineup offers faster shader performance, faster ray tracing performance, and faster AI performance. Built on a brand new process node and featuring an architecture designed from the ground up, Ada is a killer product with lots of numbers to talk about.
The fundamental of Ada was to take everything NVIDIA learned with its Turing & Ampere architectures and not only refine it but to use its DNA to form a product in a completely new performance category. Tall claims were made by NVIDIA when they introduced its Ada lineup last month with up to 4x performance claims & we will be finding out whether NVIDIA hit all the ticks with its Ada architecture as this review will be your guiding path to see what makes Ada and how it performs against its predecessors.
Today, we will be taking a look at the MSI GeForce RTX 4080 SUPRIM X & RTX 4080 Gaming X Trio. These cards were provided by MSI for the sole purpose of this review & we will be taking a look at their technology, design, and performance metrics in detail.
NVIDIA GeForce RTX 40 Series Gaming Graphics Cards - The Biggest GPU Performance Leap in Recent History
Turing wasn't just any graphics core, it was the graphics core that was to become the foundation of future GPUs. The future is realized now with next-generation consoles going deep in talks about ray tracing and AI-assisted super-sampling techniques. NVIDIA had a head start with Turing & Ampere and its Ada generation will only do things infinitely times better.
The Ada GPU does many traditional things which we would expect from a GPU, but at the same time, also breaks the barrier when it comes to untraditional GPU operations. Just to sum up some features:
- New Streaming Multiprocessor (SM)
- New 4th Gen Tensor Cores
- New Real-Time Ray Tracing Acceleration
- New Shading Enhancements
- New Deep Learning Features For Graphics & Inference
- New GDDR6X High-Performance Memory Subsystem
- New HDMI 2.1 Display Engine & Next-Gen NVENC/NVDEC
The technologies mentioned above are some of the main building blocks of the Ada GPU, but there's more within the graphics core itself which we will talk about in detail so let's get started.
Let's take a trip down the journey to Ada. In 2016, NVIDIA announced their Pascal GPUs which would soon be featured in their top to bottom GeForce 10 series lineup. After the launch of Maxwell, NVIDIA gained a lot of experience in the efficiency department which they put a focus on since their Kepler GPUs.
Four years ago, NVIDIA, rather than offering another standard leap in the rasterization performance of its GPUs took a different approach & introduced two key technologies in its Turing line of consumer GPUs, one being AI-assisted acceleration with the Tensor Cores and the second being hardware-level acceleration for Ray Tracing with its brand new RT cores.
Then came Ampere with its brand new Samsung 8nm fabrication process, NVIDIA added even more to its gaming graphics lineup. In the Ampere GPU architecture, NVIDIA provided its latest Ampere SM along with next-gen FP32, INT32, Tensor Cores, and RT cores. The focus was to boost both rasterization and ray tracing capabilities to new heights.
Now enter Ada, a brand new architecture that aims to take everything from the first two RTX GPUs and perfect it. The graphics architecture is designed for speed and that it excels at. So let's see the architecture in detail. Following are the few main highlights of the Ada Lovelace GPU architecture:
- Revolutionary New Architecture: NVIDIA Ada architecture GPUs deliver outstanding performance for graphics, AI, and compute workloads with exceptional architectural and power efficiency. After the baseline design for the Ada SM was established, the chip was scaled up to shatter records. Manufacturing innovations and materials research enabled NVIDIA engineers to craft a GPU with 76.3 billion transistors and 18,432 CUDA Cores capable of running at clocks over 2.5 GHz while maintaining the same 450W TGP as the prior generation flagship GeForce RTX 3090 Ti GPU. The result is the world’s fastest GPU with the power, acoustics, and temperature characteristics expected of a high-end graphics card.
- New Ada RT Core for Faster Ray Tracing: For decades, rendering ray-traced scenes with physically correct lighting in real-time has been considered the holy grail of graphics. At the same time, the geometric complexity of environments and objects continues to increase as 3D games and graphics continually strive to provide the most accurate representations of the real world. The Ada RT Core has been enhanced to deliver 2x faster ray-triangle intersection testing and includes two important new hardware units. An Opacity Micro map Engine speeds up ray tracing of alpha-tested geometry by a factor of 2x, and a Displaced Micro-Mesh Engine generates Displaced Micro-Triangles on-the-fly to create additional geometry. The Micro-Mesh Engine provides the benefit of increased geometric complexity without the traditional performance and storage costs of complex geometries.
- Shader Execution Reordering: NVIDIA Ada GPUs support Shader Execution Reordering which dynamically organizes & reorders shading workloads to improve RT shading Introduction efficiency. This improves performance by up to 44% in Cyberpunk 2077 with Ray Tracing Overdrive Mode.
- NVIDIA DLSS 3: The Ada architecture features an all-new Optical Flow Accelerator and AI frame generation that boosts DLSS 3’s frame rates up to 2x over the previous DLSS 2.0 while maintaining or exceeding native image quality. Compared to traditional brute-force graphics rendering, DLSS 3 is ultimately up to 4x faster while providing low system latency.
The NVIDIA Ada Lovelace AD103 GPU features up to 7 GPC (Graphics Processing Clusters). This is one more SM compared to the Ampere GA103 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don't share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.
So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 84 SM units (12 per GPC), we are looking at a total of 10,752 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM & their own L0 i-cache. This is a 33% increase in Wraps/Threads vs the GA102 GPU. The Register file size is 16,384 across a 32-bit lane. Each SM also carries its own 128 KB of L1 data cache and shared memory so that's 18 MB of L1 cache.
Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 64 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 4 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 112 ROPs for the full-die.
There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer (versus the GA103 GPU):
- 16.6% More GPCs (Versus Ampere)
- 40% More Cores (Versus Ampere)
- 50% More L1 Cache (Versus Ampere)
- 16x More L2 Cache (Versus Ampere)
- 16.6% More ROPs (Versus Ampere)
- 4th Gen Tensor & 3rd Gen RT Cores
The full die has not been featured on any GPU so far since the RTX 4080 features a cut-down layout and it is likely that as yields progress, we will eventually see a gaming and workstation product using the full-fat AD103. Till then, the RTX 4080 is the top gaming graphics card based on this GPU.
NVIDIA AD103 'Ada Lovelace' Gaming GPU Block Diagram:
NVIDIA AD103 'Ada Lovelace' Gaming GPU 'SM' Block Diagram:
NVIDIA GeForce RTX 4080
- 49 TFLOPS of peak single-precision (FP32) performance
- 98 TFLOPS of peak half-precision (FP16) performance
- 390 Tensor TFLOPS
- 780 Tensor TFLOPs with sparsity
- 113 RT-TFLOPs
At the heart of the NVIDIA GeForce RTX 4080 graphics card lies the Ada Lovelace AD103 GPU. The GPU measures 608,4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team. The GPU features an insane 45.9 Billion transistors.
NVIDIA Ada GPUs - AD102, AD103, AD104 For The First Wave of Gaming Cards
NVIDIA is first introducing three brand new Ada GPUs which include the AD102, AD103 & AD104. The AD102 GPU is going to be featured on the GeForce RTX 4090, the AD103 is going to be used by the GeForce RTX 4080 16 GB graphics cards and the AD104 GPU is going to be featured on the GeForce RTX 4080 12 GB graphics cards.
The Ada GPUs are based on the TSMC 4N process node which is a custom process designed exclusively for NVIDIA. It is essentially an optimized version of the N5 (5nm) process, offering drastic increases in transistors, cores, and frequency. The top AD103 GPU packs 16% more cores and also offers 45.9 Billion transistors while offering over 2x the performance per watt.
NVIDIA Ada AD103 GPU
The full AD103 GPU is made up of 7 graphics processing clusters with 12 SM units on each cluster. That makes up 84 SM units for a total of 10752 cores, 76 RT cores, 304Tensor Cores, 320 Texture Units, and a 256-bit bus interface in a 45.9 billion transistor package measuring 378.6mm2.
NVIDIA has also introduced its 4th Generation Tensor core architecture and 3rd Generation RT cores on Ada GPUs. Now Tensor cores have been available since Volta and consumers got a taste of it with the Turing & Ampere GPUs. One of the key areas where Tensor Cores are put to use for AAA games is DLSS. There's a whole software stack that leverages Tensor cores and that is known as the NVIDIA NGX. These software-based technologies will help enhance graphics fidelity with features such as Deep Learning Super Sampling (DLSS), AI InPainting, AI Super Rez, RTX Voice, and AI Slow-Mo.
While its initial debut was a bit flawed, DLSS in its 2nd iteration (DLSS 2.x) has done wonders to not only improve gaming performance but also image quality.
Let's dive into the technological advancements that allow these incredible achievements. To begin with, NVIDIA engineers started with DLSS Super Resolution and added something called Optical Multi Frame Generation based on Ada's Optical Flow Accelerator.
This accelerator analyzes two sequential frames from a particular game, capturing pixel details such as particles, reflections, lighting, and shadows.
On top of that, NVIDIA DLSS 3 also takes into account conventional game engine information such as motion vectors. The DLSS Frame Generation AI convolutional autoencoder network will then decide how to use each of the four inputs (current and prior frames, optical flow field, and motion vectors) to recreate intermediate frames in the best possible way.
NVIDIA DLSS 3 is said to reconstruct 3/4 of the first frame with DLSS Super Resolution and the full second frame with the help of the aforementioned DLSS Frame Generation. Overall, NVIDIA DLSS 3 reconstructs 7/8 of the two total frames displayed, which explains the massive performance uplift.
Additionally, the new version of the Deep Learning Super Sampling image reconstruction technique also includes the latency-lowering NVIDIA Reflex technology.
Cyberpunk 2077 has been shown running NVIDIA DLSS 3, the brand new Ray Tracing Overdrive, and NVIDIA Reflex with up to 4x improved performance and up to 2x reduced latency. That's not all, as NVIDIA is even promising benefits for CPU-bound games, which generally didn't run much faster with DLSS 2.0. For example, the notoriously CPU-heavy Microsoft Flight Simulator gets up to 2x improved performance with the new DLSS.
Overall, NVIDIA said the following over 35 games and apps already pledged support to NVIDIA DLSS 3.
|
|
The green company also released a performance chart on some of those games running on NVIDIA DLSS 3; check it out below.
3rd Gen RT Cores, RTX, and Real-Time Ray Tracing Dissected
Next up, we have the RT Cores, which are what will power Real-Time Raytracing. NVIDIA isn't going to distance itself from traditional rasterization-based rendering but instead follow a hybrid rendering model. The new 3rd Generation RT cores offer increased performance and offer double the ray/triangle intersection testing rate over Turing RT cores.
the Third-Generation RT Core found in Ada GPUs includes dedicated units known as the Opacity Micromap Engine and the Displaced Micro-Mesh Engine. The Opacity Micromap Engine evaluates Opacity Micromaps (represented by the triangle with foliage on the bottom left), which are used to accelerate alpha traversal. The Displaced Micro-Mesh Engine generates meshes of micro-triangles that are known as Displaced Micro-Meshes (represented by the triangle on the bottom right in the diagram below). Displaced Micro-Meshes allow the Ada RT Core to ray trace geometrically complex objects and environments with significantly less BVH build time and storage costs. Finally, ray-triangle intersection testing is 2x faster in Ada’s Third-Generation RT Core compared to the Ampere GPU generation.
NVIDIA engineers have developed three new features in the Ada RT Core to enable high-performance ray tracing of highly complex geometry:
- First, Ada’s Third-Generation RT Core features 2x Faster Ray-Triangle Intersection Throughput relative to Ampere; this enables developers to add more detail to their virtual worlds.
- Second, Ada’s RT Core has 2x Faster Alpha Traversal; the RT Core features a new Opacity Micromap Engine to directly alpha-test geometry and significantly reduce shader-based alpha computations. With this new functionality, developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and directly and more efficiently ray trace them with the Ada RT Core.
- Third, the new Ada RT Core supports 10x Faster BVH Build in 20X Less BVH Space when using its new Displaced Micro-Mesh Engine to generate micro-triangles from micro-meshes on-demand. The micro-mesh is a new primitive that represents a structured mesh of micro-triangles that the Ada RT Core processes natively, saving the storage and processing compared to what is normally required when describing complex geometries using only basic triangles.
Taken together, these three advances incorporated into the Ada RT Core enable order-of-magnitude increases in richness without commensurate increases in processing time or memory consumption.
2x Faster Ray-Triangle Intersection Testing
Ray-triangle intersection testing is a computationally expensive operation that is commonly performed when rendering a ray-traced scene. Recognizing the importance of this function, with each new RTX GPU NVIDIA engineers have strived to improve intersection testing performance and efficiency. The Third-Generation RT Core in the Ada architecture provides double the throughput for ray-triangle intersection testing over Ampere (and 4x faster than the first-generation RT Core used in Turing GPUs).
2x Faster Alpha Traversal Performance with Opacity Micromap Engine
Developers frequently use a texture’s alpha channel to economically cut out complex shapes or more generally to represent translucency. A leaf might be described using a couple of triangles, employing a texture’s alpha channel to economically capture the complex shape. A flame’s complex shape and translucency can also be approximated by alpha.
Prior to Ada’s RT Core, a developer could incorporate these kinds of content into a ray-traced scene by tagging them as not opaque. When a leaf is hit by a ray, a shader is invoked to determine how to treat the intersection, even if the ray is simply characterized as a hit or a miss. This incurs a noticeable cost. Specifically, when a warp of rays is cast towards non-opaque objects, individual ray queries may require multiple shader invocations to resolve, while other rays terminate immediately. The result is lingering live threads and commensurate inefficiency.
To efficiently handle these kinds of content, NVIDIA engineers have added an Opacity Micromap Engine to Ada’s RT Core. An opacity micromap is a virtual mesh of micro-triangles, each with an opacity state that the RT Core uses to directly resolve ray intersections with non-opaque triangles. Specifically, the barycentric coordinates of an intersection are used to address the corresponding micro-triangle’s opacity state. The opacity state may be opaque, transparent, or unknown. If opaque, then a hit is recorded and returned. If transparent, the intersection is ignored and the search for an intersection continues. If unknown, then the control is returned to the SM, invoking a shader (“anyhit”) to programmatically resolve the intersection.
The new Opacity Micromap Engine evaluates the opacity mask, which is a regular triangular mesh defined using the barycentric coordinate system used for reporting ray/triangle intersections. These meshes may be sized from one to sixteen million micro-triangles, with one or two bits associated with each micro-triangle. As a simple illustrative example, consider a detailed maple leaf described using two triangles and an alpha texture
10x Faster BVH Build in 20X Less BVH Space with Ada’s Displaced Micro-Mesh Engine
Geometric complexity continues to rise with every new generation. Ray tracing performance scales attractively with increases in scene complexity. When we ray trace complex environments, tracing costs increase slowly, a one-hundred-fold increase in geometry might only double tracing time.
However, creating the data structure (BVH) that makes that small increase in time possible requires roughly linear time and memory; 100x more geometry could mean 100x more BVH build time and 100x more memory. Ada’s Third-Generation RT Core with Displaced Micro-Meshes (DMM) helps significantly with both of the challenges of high geometric complexity - BVH builds performance and memory/storage footprint. Asset storage and transmission costs are reduced as well.
Secondary rays are generated at each primary ray hit point in the middle scene. Starting at the primary hit surfaces they shoot off in different directions, hitting different objects. Secondary hit shading tends to be less ordered and less efficient when executing on the GPU, because different shader programs are running on different threads, and often must serialize execution. Examples of secondary rays that can benefit from SER include those used for path tracing, reflections, indirect lighting, and translucency effects.
Shader Execution Reordering adds a new stage in the ray tracing pipeline which reorders and groups the secondary hit shading to have better execution locality, thus much higher overall ray-traced shading efficiency. SER can often provide up to 2X performance improvement for RT shaders in cases with a high level of divergence (such as path tracing). In testing with Cyberpunk 2077 running in RT: Overdrive Mode, we’ve measured overall performance gains of up to 44% from SER.
The Micron GDDR6X memory brings a lot of new stuff to the table. It is faster, doubles the I/O data rate, and is the first to implement PAM4 multi-level signaling in memory dies. With the Geforce RTX 3090 class products, Micron's GDDR6X memory achieves a bandwidth of up to 1 TB/s which is used to power next-generation gaming experiences at high-fidelity resolutions such as 8K.
Micron GDDR6X graphics memory doubles input/output (I/O) performance while minimizing the cost of memory. Working with AI-innovation leader NVIDIA, Micron delivers higher bandwidth by enabling multi-level signaling in the form of four-level pulse amplitude modulation (PAM4) technology in this memory device via Micron
The new GDDR6X SGRAM:
- Doubles the data rate of SGRAM at a lower power per transaction while enabling the breaking of the 1 Terabyte per second (TB/s) system memory bandwidth boundary for graphics card applications;
- Is the first discrete graphics memory device that employs PAM4-encoded signaling between the processor and the DRAM, using four voltage levels to encode and transfer two bits of data per interface clock.
- Can be designed and operated stably at high speeds and built-in mass-production.
As mentioned, GDDR6X features the brand new PAM4 multilevel signaling techniques, which help transfer data much faster, double the I/O rate, pushing the capability of each memory dies from 64 GB/s to 84 GB/s. The Micron GDDR6X memory dies are also the only graphics DRAM that can be mass-produced while featuring PAM4 signaling.
What is interesting is that Micron quotes that its GDDR6X memory can hit speeds of up to 22.4 Gbps whereas we have only got to see 21 Gbps in action on the GeForce RTX 3090 Ti. It is likely that AIBs could utilize higher binned dies as they are available. Micron does has faster chips but those aren't coming to NV 40 series graphics cards for now.
It's not just faster speeds but Micron's GDDR6X provides higher bandwidth while sipping in 15% lower power per transferred bit compared to the previous generation GDDR6 memory. PAM4 signaling is a big upgrade from the two-level NRZ signaling on the GDDR6 memory.
Instead of transmitting two binary bits of data each clock cycle (one bit on the rising edge and one bit on the falling edge of the clock), PAM4 sends two bits on each clock edge, encoded using four different voltage levels. The voltage levels are divided into 250 mV steps with each level representing two bits of data - 00, 01, 10, or 11 sent on each clock edge (still DDR technology).
Micron GDDR6X Memory
| Feature | GDDR5 | GDDR5X | GDDR6 | GDDR6X |
|---|---|---|---|---|
| Density | From 512Mb to 8Gb | 8Gb | 8Gb, 16Gb | 8Gb, 16Gb |
| VDD and VDDQ | Either 1.5V or 1.35V | 1.35V | Either 1.35V or 1.25V | Either 1.35V or 1.25V |
| VPP | N/A | 1.8V | 1.8V | 1.8V |
| Data rates | Up to 8 Gb/s | Up to 12Gb/s | Up to 16 Gb/s | 19 Gb/s, 21 Gb/s, >21 Gb/s |
| Channel count | 1 | 1 | 2 | 2 |
| Access granularity | 32 bytes | 64 bytes 2x 32 bytes in pseudo 32B mode | 2 ch x 32 bytes | 2 ch x 32 bytes |
| Burst length | 8 | 16 / 8 | 16 | 8 in PAM4 mode 16 in RDQS mode |
| Signaling | POD15/POD135 | POD135 | POD135/POD125 | PAM4 POD135/POD125 |
| Package | BGA-170 14mm x 12mm 0.8mm ball pitch | BGA-190 14mm x 12mm 0.65mm ball pitch | BGA-180 14mm x 12mm 0.75mm ball pitch | BGA-180 14mm x 12mm 0.75mm ball pitch |
| I/O width | x32/x16 | x32/x16 | 2 ch x16/x8 | 2 ch x16/x8 |
| Signal count | 61 - 40 DQ, DBI, EDC - 15 CA - 6 CK, WCK | 61 - 40 DQ, DBI, EDC - 15 CA - 6 CK, WCK | 70 or 74 - 40 DQ, DBI, EDC - 24 CA - 6 or 10 CK, WCK | 70 or 74 - 40 DQ, DBI, EDC - 24 CA - 6 or 10 CK, WCK |
| PLL, DCC | PLL | PLL | PLL, DCC | DCC |
| CRC | CRC-8 | CRC-8 | 2x CRC-8 | 2x CRC-8 |
| VREFD | External or internal per 2 bytes | Internal per byte | Internal per pin | Internal per pin 3 sub-receivers per pin |
| Equalization | N/A | RX/TX | RX/TX | RX/TX |
| VREFC | External | External or Internal | External or Internal | External or Internal |
| Self refresh (SRF) | Yes Temp. Controlled SRF | Yes Temp. Controlled SRF Hibernate SRF | Yes Temp. Controlled SRF Hibernate SRF VDDQ-off | Yes Temp. Controlled SRF Hibernate SRF VDDQ-off |
| Scan | SEN | IEEE 1149.1 (JTAG) | IEEE 1149.1 (JTAG) | IEEE 1149.1 (JTAG) |
With each new generation of graphics cards, NVIDIA delivers a new range of display technologies. This generation is no different, and we see some significant updates to the display engine and the graphics interconnect. With the adoption of faster GDDR6X memory, which provides higher bandwidth, faster compression, and more cache, gaming applications can now run at higher resolutions, supporting more details on the display.
The Ada Display Engine supports two new display technologies, HDMI 2.1 and DisplayPort 1.4a with DSC 1.2a. HDMI 2.1 allows up to 48 Gbps of total bandwidth and up to 4K 240Hz HDR and 8K 60Hz HDR.
DisplayPort 1.4a allows for up to 8K resolutions with 60Hz refresh rates and includes VESA's display stream compression 1.2 technology with visually lossless compression. You can run up to two 8K displays at 60 Hz using two cables, one for each display. In addition to that, Ampere also supports HDR processing natively with tone mapping added to the HDR pipeline.
Ada GPUs take streaming and video content to the next level, incorporating support for AV1 video encoding in the Ada eighth-generation dedicated hardware encoder (known as NVENC). Prior generation Ampere GPUs supported AV1 decoding but not encoding. Ada’s AV1 encoder is 40% more efficient than the H.264 encoder used in GeForce RTX 30 Series GPUs. AV1 will enable users who are streaming at 1080p today to increase their stream resolution to 1440p while running at the same bitrate and quality, or for users with 1080p displays, streams will look similar to 1440p, providing better quality.
Ada GPUs are also equipped with dual NVENC encoders. This enables video encoding at 8K/60 for professional video editing or four 4K/60. (Game streaming services can also take advantage of this to enable more simultaneous sessions, for instance.) Blackmagic Design’s DaVinci Resolve, the popular Voukoder plugin for Adobe Premiere Pro, and Jianying — the top video editing app in China — are all enabling AV1 support, as well as a dual encoder through encode presets. Dual encoder and AV1 availability for these apps will be available in October. NVIDIA is also working with the popular video-effects app Notch to enable AV1, as well as Topaz to enable support for AV1 and the dual encoders.
In addition to NVENC, Ada GPUs also include the fifth-generation hardware decoder that was first launched with Ampere (known as NVDEC). NVDEC supports hardware-accelerated video decoding of MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and the AV1 video formats. 8K/60 decoding is also fully supported. In the future, NVIDIA is also working to enable high-quality video production using AI.
NVIDIA RTX IO - Blazing Fast Read Speeds With GPU Utilization
As storage sizes have grown, so has storage performance. Gamers are increasingly turning to SSDs to reduce game load times: while hard drives are limited to 50-100 MB/sec throughput, the latest M.2 PCIe Gen4 SSDs deliver up to 7 GB/sec. With the traditional storage model, game data is read from the hard disk, then passed from the system memory and CPU before being passed to the GPU.
Historically games have read files from the hard disk, using the CPU to decompress the game image. Developers have used lossless compression to reduce install sizes and improve I/O performance. However, as storage performance has increased, traditional file systems and storage APIs have become a bottleneck. For example, decompressing game data from a 100 MB/sec hard drive takes only a few CPU cores, but decompressing data from a 7 GB/sec PCIe Gen4 SSD can consume more than twenty AMD Ryzen Threadripper 3960X CPU cores!
Using the traditional storage model, game decompression can consume all 24 cores on a Threadripper CPU. Modern game engines have exceeded the capability of traditional storage APIs. A new generation of I/O architecture is needed. Data transfer rates are the gray bars, and CPU cores required are the black/blue blocks.
NVIDIA RTX IO is a suite of technologies that enable rapid GPU-based loading and decompression of game assets, accelerating I/O performance by up to 100x compared to hard drives and traditional storage APIs. When used with Microsoft’s new DirectStorage for Windows API, RTX IO offloads dozens of CPU cores’ worth of work to your RTX GPU, improving frame rates, enabling near-instantaneous game loading, and opening the door to a new era of large, incredibly detailed open-world games.
Object pop-in and stutter can be reduced, and high-quality textures can be streamed at incredible rates, so even if you’re speeding through a world, everything runs and looks great. In addition, with lossless compression, game download and install sizes can be reduced, allowing gamers to store more games on their SSD while also improving their performance.
NVIDIA RTX IO plugs into Microsoft’s upcoming DirectStorage API, which is a next-generation storage architecture designed specifically for state-of-the-art NVMe SSD-equipped gaming PCs and the complex workloads that modern games require. Together, streamlined and parallelized APIs specifically tailored for games allow dramatically reduced IO overhead and maximize performance/bandwidth from NVMe SSDs to your RTX IO-enabled GPU.
Specifically, NVIDIA RTX IO brings GPU-based lossless decompression, allowing reads through DirectStorage to remain compressed and delivered to the GPU for decompression. This removes the load from the CPU, moving the data from storage to the GPU in a more efficient, compressed form, and improving I/O performance by a factor of two.
GeForce RTX GPUs will deliver decompression performance beyond the limits of even Gen4 SSDs, offloading potentially dozens of CPU cores’ worth of work to ensure maximum overall system performance for next-generation games. Lossless decompression is implemented with high-performance compute kernels, asynchronously scheduled. This functionality leverages the DMA and copy engines of Turing and Ampere, as well as the advanced instruction set, and architecture of these GPU’s SM’s.
The advantage of this is that the enormous compute power of the GPU can be leveraged for burst or bulk loading (at level load, for example) when GPU resources can be leveraged as high-performance I/O processors, delivering decompression performance well beyond the limits of Gen4 NVMe. During streaming scenarios, bandwidths are a tiny fraction of the GPU capability, further leveraging the advanced asynchronous compute capabilities of Turing and Ampere. Microsoft is targeting a developer preview of DirectStorage for Windows for game developers next year, and NVIDIA Turing & Ampere gamers will be able to take advantage of RTX IO-enhanced games as soon as they become available.
The NVIDIA GeForce RTX 4080 will use 76 SMs of the 84 SMs for a total of 9728 CUDA cores. The GPU will come packed with 64 MB of L2 cache and a total of 112 ROPs which is simply insane. The clock speeds for the graphics card are rated at 2210 MHz base and 2510 MHz boost clocks and we have already seen over 3 GHz speeds with overclocking which you can read more about here.
As for memory specs, the GeForce RTX 4080 features 16 GB GDDR6X capacities that will be adjusted at 22.5 Gbps speeds across a 256-bit bus interface. This will provide up to 720 GB/s of bandwidth. This is still a tad bit slower than the 760 GB/s bandwidth offered by the RTX 3080 since it comes with a 320-bit interface but a lowly 10 GB capacity. To compensate for the lower bandwidth, NVIDIA could be integrating a next-gen memory compression suite to make up for the 256-bit interface.
- NVIDIA GeForce RTX 4080 16 GB "Official" TBP - 320W
- NVIDIA GeForce RTX 3080 12 GB "Official" TBP - 350W
As far as the power consumption is concerned, the TBP is rated at 320W. The card will be powered by a single 16-pin connector which delivers up to 600W of power. Custom models will be offering higher TBP targets.
NVIDIA GeForce RTX 4080 Graphics Cards Performance
As for the performance of these monster GPUs, NVIDIA shared the computational and gaming performance figures and it looks like the GeForce RTX 4080 will be sitting slightly ahead of the GeForce RTX 3090 Ti with around 50 TFLOPs of Compute power.
Just for comparison's sake:
- NVIDIA GeForce RTX 4090: 83 TFLOPs (FP32) (2.5 GHz Boost clock)
- NVIDIA GeForce RTX 4080: 49 TFLOPs (FP32) (2.5 GHz Boost clock)
- NVIDIA GeForce RTX 3090 Ti: 49 TFLOPs (FP32) (1.86 GHz Boost clock)
- NVIDIA GeForce RTX 3090: 36 TFLOPs (FP32) (1.69 GHz Boost clock)
Based on a boost clock speed of 2.5 GHz, you get up to 49TFLOPs of compute performance and you can definitely squeeze out a lot more with an overclock as we had demonstrated with the RTX 4090. One should remember that compute performance doesn't necessarily indicate the overall gaming performance. Even so, it will be a huge upgrade for gaming PCs and an 8.5x increase over the current fastest console, the Xbox Series X.
FP32 Compute Horsepower Comparisons (Higher is Better)
This will be a 2x compute performance uplift and a 2x gain in gaming performance as NVIDIA has demonstrated for each graphics card versus its predecessor and this is without even factoring in the RT and Tensor core performance which are expected to get major lifts too in their respective department. A 2-4x gain over the RTX 3090 & RTX 3090 Ti would be very disruptive.
Gamers should expect 4K gaming to be buttery smooth on these graphics cards and with DLSS, we might even see playable 60 FPS at 8K resolution which is something that NVIDIA has been trying to achieve with its RTX 3090 series BFGPUs for a while now.
NVIDIA GeForce RTX 4080 Graphics Cards Price & Availability
The NVIDIA GeForce RTX 4080 16 GB graphics card will be available starting tomorrow for a price of $1199 US. The card will be available in both Founders Edition and custom graphics card flavors at launch.
NVIDIA GeForce RTX 40 Series Official Specs:
| Graphics Card Name | NVIDIA GeForce RTX 4090 | NVIDIA GeForce RTX 4090 D | NVIDIA GeForce RTX 4080 | NVIDIA GeForce RTX 4070 Ti | NVIDIA GeForce RTX 4070 | NVIDIA GeForce RTX 4060 Ti | NVIDIA GeForce RTX 4060 |
|---|---|---|---|---|---|---|---|
| GPU Name | Ada Lovelace AD102-300 | Ada Lovelace AD102-250 | Ada Lovelace AD103-300 | Ada Lovelace AD104-400 | Ada Lovelace AD104-250 | Ada Lovelace AD106-350 | Ada Lovelace AD107-400 |
| Process Node | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N |
| Die Size | 608mm2 | 608mm2 | 378.6mm2 | 294.5mm2 | 294.5mm2 | 190.0mm2 | 146.0mm2 |
| Transistors | 76 Billion | 76 Billion | 45.9 Billion | 35.8 Billion | 35.8 Billion | 22.9 Billion | TBD |
| CUDA Cores | 16384 | 14592 | 9728 | 7680 | 5888 | 4352 | 3072 |
| TMUs / ROPs | 512 / 176 | TBD | 320 / 112 | 240 / 80 | 184 / 64 | 136 / 48 | TBD |
| Tensor / RT Cores | 512 / 128 | 456 / 128 | 304 / 76 | 240 / 60 | 184 / 46 | 136 / 34 | TBD |
| L2 Cache | 72 MB | 72 MB | 64 MB | 48 MB | 36 MB | 32 MB | 24 MB |
| Base Clock | 2230 MHz | 2280 MHz | 2210 MHz | 2310 MHz | 1920 MHz | 2310 MHz | 1830 MHz |
| Boost Clock | 2520 MHz | 2520 MHz | 2510 MHz | 2610 MHz | 2475 MHz | 2535 MHz | 2460 MHz |
| FP32 Compute | 83 TFLOPs | TBD | 49 TFLOPs | 40 TFLOPs | 29 TFLOPs | 22 TFLOPs | 15 TFLOPs |
| RT TFLOPs | 191 TFLOPs | TBD | 113 TFLOPs | 82 TFLOPs | 67 TFLOPs | 51 TFLOPs | 35 TFLOPs |
| Tensor-TOPs | 1321 TOPs | TBD | 780 TOPs | 641 TOPs | 466 TOPs | 353 TOPs | 242 TOPs |
| Memory Capacity | 24 GB GDDR6X | 24 GB GDDR6X | 16 GB GDDR6X | 12 GB GDDR6X | 12 GB GDDR6X | 8-16 GB GDDR6 | 8 GB GDDR6 |
| Memory Bus | 384-bit | 384-bit | 256-bit | 192-bit | 192-bit | 128-bit | 128-bit |
| Memory Speed | 21.0 Gbps | 21.0 Gbps | 23.0 Gbps | 21.0 Gbps | 21.0 Gbps | 18.0 Gbps | 17.0 Gbps |
| Bandwidth | 1008 GB/s | 1008 GB/s | 736 GB/s | 504 GB/s | 504 GB/s | 288 GB/s (554 GB/s Effective) | 272 GB/s (453 GB/s Effective) |
| TBP | 450W | 425W | 320W | 285W | 200W | 160-165W | 115W |
| Price (MSRP / FE) | $1599 US / 1949 EU | 12,999 RMB (China-Only) | $1199 US / 1469 EU | $799 US | $599 US | $399-$499 US | $299 US |
| Price (Current) | $1599 US / 1859 EU | 12,999 RMB (China-Only) | $1199 US / 1399 EU | $799 US | $599 US | $399-$499 US | $299 US |
| Launch (Availability) | 12th October 2022 | 28th December 2023 | 16th November 2022 | 5th January 2023 | 13th April 2023 | 24th May / 18th July 2023 | 29th June 2023 |
The MSI GeForce RTX 4080 SUPRIM X graphics card comes inside a standard cardboard box. The front of the package has a large "GeForce RTX" brand logo along with the "MSI" logo in the top left corner and the "SUPRIM X" series branding in the lower-left corner. A large picture of the graphics card itself is depicted on the front which gives a nice preview of the SUPRIM X design.
The packaging has put a large emphasis on the RTX side of things as the first feature enlisted by AIBs will be NVIDIA Ada architecture, Ray Tracing & DLSS support. NVIDIA has bet the future of their gaming GPUs on Ray Tracing support as these are the first cards to offer support for the new feature.
The back of the box is very typical, highlighting the main features and specifications of the cards. The three key aspects of MSI's top-tier custom cards are its blazing performance which is achieved by fully custom design, the new Tri-Frozr 3S cooling system, and a new Torx Fan 5.0 fan and Vapor Chamber cooler which will offer better cooling performance.
There's also a focus towards GeForce.com on each AIB card through which users can download the latest drivers and GeForce Experience application which are a must for gamers to access all feature sets of the new cards.
The sides of the box once again greet us with the large GeForce RTX branding. There's also the mention of 16 GB GDDR6X (RTX 4080) memory available on the card. Opening the box, you are greeted with a nice SUPRIM logo.
Outside of the box, the graphics card and the accessory package are held firmly by foam packaging. The graphics card comes with a few accessories and manuals which might not be of much use for hardcore enthusiasts but can be useful for the mainstream gaming audience. The only two useful accessories are the GPU mounting anti-sag bar and the 16-pin to 4x 8-pin power adapter. There's also a nice mousepad that MSI ships with its SUPRIM series lineup.
The card is nicely wrapped within an anti-static cover which is useful to prevent any unwanted static discharges on various surfaces that might harm the graphics card. The most interesting accessory that I found in the package was a graphics card support bracket. This bracket connects the graphics card to the casing, offering better durability and preventing any sort of bending that may occur due to the heavy weight of the Gaming X Trio & SUPRIM X series graphics cards.
After the package is taken care of, I can finally start talking about the card itself. This thing is a beast and I can't wait to test it out to find what kind of performance improvement I get over current-gen cards.
MSI’s Tri Frozr heat sinks are some of the biggest heatsink cooling solutions that I have ever tested. I first tested the Gaming X Trio when MSI released the 1080 Ti variant back in 2017 and that was a very aggressive design in its own right. Since then, I have tested the RTX 2080 Ti, RTX 3090, RTX 3090 Ti in their Tri-Frozr iterations. With the RTX 40 series cards, MSI has further refined the Tri Frozr design. The card measures the same at 336 x 142 x 78 mm and weighs in at 2364 grams, respectively. The card features a standard 3.5 slot height which is expected of today's high-end cards.
You would have to keep in mind the height when going for a triple or quad-slot card solution as your case or motherboard PCIe slot combination may not allow such a setup. The cooling shroud extends all the way to the back of the PCB and it requires a casing with good interior space for proper installation.
The back of the card features a solid backplate that looks stunning. The backplate offers a lot more functionality than just looks which I will get back to in a bit.
In terms of design, we are looking at an updated version of the Tri Frozr heatsink known as Tri Frozr 3S which is now in its eighth variation while for the SUPRIM X series, this is the 2nd iteration. The first variation started off with the GTX 780 Ti Lightning, the second was the 980 Ti Lightning, then came the 1080 Ti Gaming X Trio, the 1080 Ti Lightning, then the RTX 20 & RTX SUPER Gaming X Trio graphics cards while the seventh generation was introduced on the RTX 30 series. Now we are in the eighth generation.
The new heatsink looks like a beefed-up version of the SUPRIM X heatsink that we saw on the 3090 Ti with the main changes being the shroud and heatsink design that feature a neater shroud design on the front, absorbing the black and silver color platelets while featuring the RGB emitting V-shaped acrylic cutouts at the front. The sides also come with a large RGB accent bar which lights up when the card is powered on.
Coming to the fans, the card actually features the latest fan designs based on the Torx 5.0 system. All three fans feature a ring-based design to allow for higher airflow to be channeled within the main heatsink. All fans deploy a double ball bearing design and can last a long time while operating silently. Each fan has three blades that form three sets and each fan has three sets of them that make up a total of 9 fan blades. Each blade is tilted at a 22 degrees angle to the main high-pressure airflow.
MSI also features its Zero Frozr technology on the Tri Frozr heatsink. This feature won’t spin the fans on the card unless they reach a certain threshold. If you notice closely, you can see that the card features beveled edges that are polished several times with a diamond-tipped cutter to achieve a mirror finish and that can give a slight gold effect which looks great.
In the case of the Tri Frozr heatsink, that limit is set to 60C. If the card is operating under 60C, the fans won’t spin which means no extra noise would be generated.
I am back at talking about the full-coverage, full metal-based backplate that the card uses. The whole plate is made of solid metal with rounded edges that add to the durability of this card. The brushed matte-black finish on the backplate gives a unique aesthetic. The graphics card also comes with a compact PCB design which means that the shroud, heatsink, and backplate are all extended beyond the PCB. The third fan blows air through the heatsink and blows it out from the cutouts that are situated at the very end of the backplate.
There are cutouts in screw placements to easily reach the points on the graphics card. We can also see the new SUPRIM logo which drops the Dragon design and goes for a Diamond shape on the back which looks stunning. MSI is also using heat pads beneath the backplate which offer more cooling to the electrical circuitry on the PCB. The most interesting thing to spot on the back aside from the backplate is the large retention metal bracket which adds more mounting pressure to effectively disperse heat from the GPU to the heatsink.
With the outside of the card done, I will now start taking a glance at what's beneath the hood of these monster graphics cards. The first thing to catch my eye is the humungous fin stack that's part of the beefy heatsink that the cards utilize.
The large fin stack runs all the way from the front and to the back of the PCB and is so thick that you can barely see through it. It also comes with the wave-curved 3.0 fin stack design which I want to shed some light on as it is a turn away from traditional fin design and one that actually offers better cooling on high-end graphics cards such as the RTX 3090 Ti. The card also uses antegrade fins on the back that direct and optimize air pass-through on the back, allowing more warm air to pass out of the card like a nozzle.
The heatsink has been designed to be denser by using a wave-curved and filled-fin design. It allows more air to pass through the fins smoothly, without causing any turbulence that would result in unwanted noise. Airflow Control Technology guides the airflow directly onto the heat pipes, while simultaneously creating more surface area for the air to absorb more heat before leaving the heatsink. The heat pipes have also been arranged in a way that allows MSI to stack even more fin room.
Talking about the heatsink, the massive block is comprised of 11 copper squared-shaped heat pipes with a more concentrated design to transfer heat from the copper base to the heatsink more effectively. The base itself is a solid nickel-plated base plate, transferring heat to the heat pipes in a very effective manner. To top it all off,
MSI adds extra protection to its impressive PCB by including a rugged anti-bending plate. This also acts as a memory and MOSFET cooling plate while the PWM heatsink with micro fins keeps the VRM cool under stressful conditions.
I/O on the graphics card sticks with the reference scheme which includes three Display Port 1.4a & a single HDMI 2.1 port.
There's also a dual-BIOS switch on the card which comes pre-configured with Silent & Gaming modes. The BIOS doesn't affect the clock profiles but rather affects the maximum power limit, enabling higher fan speeds for better cooling and more stable clocks. The limits are 320W for the silent and 400W for the gaming profile.
MSI GeForce RTX 4080 SUPRIM X Teardown:
MSI makes use of a 22+2 phase PWM design that is made up of high-quality components such as HCI or High-Efficiency Carbony Inductors, SPS (Smart Power Stages), and hardened defense fuse. The card also uses the latest GDDR6X DRAM from Micron which operates at 22.4 Gbps alongside a 256-bit wide memory interface.
The MSI GeForce RTX 4080 SUPRIM X is a very power-hungry graphics card as showcased by its custom design. Being so, the card utilizes a single 16-pin connector which can deliver up to 550 Watts of power to the graphics card. The card is rated at 320W but ends up around 400W with its full power limit.
MSI GeForce RTX 4080 SUPRIM X RGB Lighting Gallery:
MSI SUPRIM X series cards utilize their Mystic Light RGB technology to offer you a visually pleasing lighting experience on your graphics cards.
There are a total of 5 different RGB effects that you can choose from and the cards have 3 RGB accent points on the front, one on the back, and one lightbar surrounding the side of the card which looks really good. You can fully customize the RGB lights to your preference using the MSI Mystic Light application from MSI's web page.
Following is what the graphics card looks like when lit up.

The MSI GeForce RTX 4080 Gaming X Trio graphics card comes inside a standard cardboard box. The front of the package has a large "GeForce RTX" brand logo along with the "MSI" logo in the top left corner and the "Gaming X Trio" series branding in the lower-left corner. A large picture of the graphics card itself is depicted on the front which gives a nice preview of the Gaming X Trio design.
The packaging has put a large emphasis on the RTX side of things as the first feature enlisted by AIBs will be NVIDIA Ada architecture, Ray Tracing & DLSS support. NVIDIA has bet the future of their gaming GPUs on Ray Tracing support as these are the first cards to offer support for the new feature.
The back of the box is very typical, highlighting the main features and specifications of the cards. The three key aspects of MSI's top-tier custom cards are its blazing performance which is achieved by fully custom design, the new Tri-Frozr 3 cooling system, and a new Torx Fan 5.0 fan and Core Pipe design which will offer better cooling performance.
There's also a focus towards GeForce.com on each AIB card through which users can download the latest drivers and GeForce Experience application which are a must for gamers to access all feature sets of the new cards.
The sides of the box once again greet us with the large GeForce RTX branding. There's also the mention of 16 GB GDDR6X (RTX 4080) memory available on the card. Opening the box, you are greeted with a nice SUPRIM logo.
Outside of the box, the graphics card and the accessory package are held firmly by foam packaging. The graphics card comes with a few accessories and manuals which might not be of much use for hardcore enthusiasts but can be useful for the mainstream gaming audience. The only two useful accessories are the GPU mounting anti-sag bar and the 16-pin to 3x 8-pin power adapter. There's also a nice mousepad that MSI ships with its SUPRIM series lineup.
The card is nicely wrapped within an anti-static cover which is useful to prevent any unwanted static discharges on various surfaces that might harm the graphics card. The most interesting accessory that I found in the package was a graphics card support bracket. This bracket connects the graphics card to the casing, offering better durability and preventing any sort of bending that may occur due to the heavy weight of the Gaming X Trio & SUPRIM X series graphics cards.
After the package is taken care of, I can finally start talking about the card itself. This thing is a beast and I can't wait to test it out to find what kind of performance improvement I get over current-gen cards.
MSI’s Tri Frozr heat sinks are some of the biggest heatsink cooling solutions that I have ever tested. I first tested the Gaming X Trio when MSI released the 1080 Ti variant back in 2017 and that was a very aggressive design in its own right. Since then, I have tested the RTX 2080 Ti, RTX 3090, RTX 3090 Ti in their Tri-Frozr iterations. With the RTX 40 series cards, MSI has further refined the Tri Frozr design. The card measures the same at 337 x 140 x 67 mm and weighs in at 1876 grams (making it just around 500g lighter than the SUPRIM X), respectively. The card features a standard 3-slot height which is expected of today's high-end cards.
You would have to keep in mind the height when going for a triple or quad-slot card solution as your case or motherboard PCIe slot combination may not allow such a setup. The cooling shroud extends all the way to the back of the PCB and it requires a casing with good interior space for proper installation.
The back of the card features a solid backplate that looks stunning. The backplate offers a lot more functionality than just looks which I will get back to in a bit.
In terms of design, we are looking at an updated version of the Tri Frozr heatsink known as Tri Frozr 3 which is now in its eighth variation. The first variation started off with the GTX 780 Ti Lightning, the second was the 980 Ti Lightning, then came the 1080 Ti Gaming X Trio, the 1080 Ti Lightning, then the RTX 20 & RTX SUPER Gaming X Trio graphics cards while the seventh generation was introduced on the RTX 30 series. Now we are in the eighth generation.
The new heatsink looks like a beefed-up version of the Gaming X Trio heatsink that we saw on the 3090 with the main changes being the shroud and heatsink design that features a neater shroud design on the front which features the claw-shaped RGB pattern on the front and a carbon-fiber touch across the sides of the cards.
Coming to the fans, the card actually features the latest fan designs based on the Torx 5.0 system. All three fans feature a ring-based design to allow for higher airflow to be channeled within the main heatsink. All fans deploy a double ball bearing design and can last a long time while operating silently. Each fan has three blades that form three sets and each fan has three sets of them that make up a total of 9 fan blades. Each blade is tilted at a 22 degrees angle to the main high-pressure airflow.
MSI also features its Zero Frozr technology on the Tri Frozr heatsink. This feature won’t spin the fans on the card unless they reach a certain threshold. If you notice closely, you can see that the card features beveled edges that are polished several times with a diamond-tipped cutter to achieve a mirror finish and that can give a slight gold effect which looks great.
In the case of the Tri Frozr heatsink, that limit is set to 60C. If the card is operating under 60C, the fans won’t spin which means no extra noise would be generated.
I am back at talking about the full-coverage, full metal-based backplate that the card uses. The whole plate is made of solid metal with rounded edges that add to the durability of this card. The matte-black finish on the backplate gives a unique aesthetic. The graphics card also comes with a compact PCB design which means that the shroud, heatsink, and backplate are all extended beyond the PCB. The third fan blows air through the heatsink and blows it out from the cutouts that are situated at the very end of the backplate.
There are cutouts in screw placements to easily reach the points on the graphics card. We can also see the iconic MSI Dragon logo. MSI is also using heat pads beneath the backplate which offer more cooling to the electrical circuitry on the PCB. The most interesting thing to spot on the back aside from the backplate is the large retention metal bracket which adds more mounting pressure to effectively disperse heat from the GPU to the heatsink.
With the outside of the card done, I will now start taking a glance at what's beneath the hood of these monster graphics cards. The first thing to catch my eye is the humungous fin stack that's part of the beefy heatsink that the cards utilize.
The large fin stack runs all the way from the front and to the back of the PCB and is so thick that you can barely see through it. It also comes with the wave-curved 2.0 fin stack design which I want to shed some light on as it is a turn away from traditional fin design and one that actually offers better cooling on high-end graphics cards such as the RTX 3090 Ti. The card also uses antegrade fins on the back that direct and optimize air pass-through on the back, allowing more warm air to pass out of the card like a nozzle.
The heatsink has been designed to be denser by using a wave-curved and filled-fin design. It allows more air to pass through the fins smoothly, without causing any turbulence that would result in unwanted noise. Airflow Control Technology guides the airflow directly onto the heat pipes, while simultaneously creating more surface area for the air to absorb more heat before leaving the heatsink. The heat pipes have also been arranged in a way that allows MSI to stack even more fin room.
Talking about the heatsink, the massive block is comprised of 7 copper squared-shaped heat pipes with a more concentrated design to transfer heat from the copper base to the heatsink more effectively. The base itself is a solid nickel-plated base plate, transferring heat to the heat pipes in a very effective manner. To top it all off,
MSI adds extra protection to its impressive PCB by including a rugged anti-bending plate. This also acts as a memory and MOSFET cooling plate while the PWM heatsink with micro fins keeps the VRM cool under stressful conditions.
I/O on the graphics card sticks with the reference scheme which includes three Display Port 1.4a & a single HDMI 2.1 port.
There's also a dual-BIOS switch on the card which comes pre-configured with Silent & Gaming modes. The BIOS doesn't affect the clock profiles but rather affects the maximum power limit, enabling higher fan speeds for better cooling and more stable clocks. The limits are 320W for the silent and 380W for the gaming profile.
MSI GeForce RTX 4080 Gaming X Trio Teardown:
The MSI GeForce RTX 4080 Gaming X Trio is a very power-hungry graphics card as showcased by its custom design. Being so, the card utilizes a single 16-pin connector which can deliver up to 450 Watts of power to the graphics card. The card is rated at 320W but ends up around 380W with its full power limit.
MSI GeForce RTX 4080 Gaming X Trio RGB Lighting Gallery:
MSI Gaming X Trio series cards utilize their Mystic Light RGB technology to offer you a visually pleasing lighting experience on your graphics cards. Following is what the graphics card looks like when lit up.
There are a total of 6 different RGB effects that you can choose from and the cards have 5 RGB accent points on the front, and one on the side. You can fully customize the RGB lights to your preference using the MSI Mystic Light application from MSI's web page.
We used the following test system for comparison between the different graphics cards. The latest drivers that were available at the time of testing were used by AMD and NVIDIA on an updated version of Windows 11. All tested games were patched to the latest version for better performance optimization for NVIDIA and AMD GPUs.
NVIDIA GeForce RTX 4090 Test Setup
| CPU | Intel Core i9-12900K @ 5.0 GHz |
|---|---|
| Motherboard | AORUS Z690 Master (DDR5) |
| Video Cards | Colorful GeForce RTX 4090 Vulcan OC-V MSI GeForce RTX 4090 SUPRIM Liquid X MSI GeForce RTX 4090 SUPRIM X NVIDIA GeForce RTX 4090 FE NVIDIA GeForce RTX 3090 FE NVIDIA GeForce RTX 3080 Ti FE NVIDIA GeForce RTX 3080 FE MSI Radeon RX 6950 XT Gaming X Trio MSI GeForce RTX 3090 Ti SUPRIM X MSI GeForce RTX 3090 SUPRIM X, MSI Radeon RX 6900 XT Gaming Z Trio MSI GeForce RTX 3080 Ti SUPRIM X MSI Radeon RX 6800 XT Gaming X Trio MSI GeForce RTX 3080 SUPRIM X MSI GeForce RTX 3070 Ti SUPRIM X MSI GeForce RTX 2080 Ti Lightning MSI GeForce RTX 3070 Gaming X Trio |
| Memory | G.SKILL Trident Z5 RGB Series 32GB (2 X 16GB) CL36 6000 MHz |
| Storage | Teamgroup T-Force A440 Pro 2 TB Gen 4 |
| Power Supply | ASUS ROG THOR 1200W PSU |
| OS | Windows 11 64-bit |
| Drivers | AMD Radeon Adrenalin Edition 22.9.2 NVIDIA GeForce 521.90 WHQL |
- All games were tested at 3840x2160 (4K) resolution.
- Image Quality and graphics configurations are provided with each game description.
- The "reference" cards are the stock configs except where mentioned otherwise.
Firestrike
Firestrike is running the DX11 API and is still a good measure of GPU scaling performance. In this test, we ran the Extreme and Ultra versions of Firestrike which runs at 1440p and 4K and we recorded the Graphics Score only since the Physics and combined are not pertinent to this review.
3DMark Firestrike Extreme Graphics
3DMark Firestrike Ultra Graphics
Time Spy
Time Spy is running the DX12 API and we used it in the same manner as Firestrike Extreme where we only recorded the Graphics Score as the Physics score is recording the CPU performance and isn't important to the testing we are doing here.
3DMark Time Spy Graphics
3DMark Time Spy Extreme Graphics
Port Royal
Port Royal is another great tool in the 3DMark suite, but this one is 100% targeting Ray Tracing performance. It loads up ray-traced shadows, reflections, and global illumination to really tax the performance of the graphics cards that either has hardware-based or software-based ray-tracing support.
3DMark Port Royal Score
3DMark Pure Ray Tracing Feature Test
Crysis Remastered (DXVK RT)
Crysis is back with a vengeance to reclaim its title of the graphics crown. The remastered version of the game uses DX11 API but has Vulkan extensions on top which enable Vulkan Ray tracing. That's also something that the original game didn't offer. DXVK, along with improved textures and visual effects, leads to higher performance demand making us question once again "Can It Run Crysis?"
Crysis Remastered (4K Native RT SMAA2TX)
Doom Eternal
DOOM Eternal brings hell to earth with the Vulkan-powered idTech 7. We test this game using the Ultra Nightmare Preset and follow our in-game benchmarking to stay as consistent as possible.
DOOM Eternal
Red Dead Redemption 2
Developed by Rockstar San Diego, Red Dead Redemption 2 is one of the most visually stunning open-world games I've played to date that is backed up by a rich story set around the protagonist, Arthur Morgan. The game is based on the RAGE engine which features an insane amount of graphics fidelity but also requires a lot of power to run maxed out. For the purpose of this test, we set the graphics settings to Ultra with AA turned disabled.
Red Dead Redemption 2
Wolfenstein: Youngblood
Wolfenstein is back in The New Colossus and features the most fast-paced, gory, and brutal FPS action ever! The game once again puts us back in the Nazi-controlled world as BJ Blazkowicz. Set during an alternate future where Nazis won the World War, the game shows that it can be fun and can be brutal to the player and to the enemy too. Powering the new title is, once again, id Tech 6 which is much acclaimed after the success that DOOM has become. In a way, ID has regained its glorious FPS roots and is slaying with every new title.
Wolfenstein
Battlefield V
Battlefield V brings back the action of the World War 2 shooter genre. Using the latest Frostbite tech, the game does a good job of looking gorgeous in all ways possible. From the open-world environments to the intense and gun-blazing action, this multiplayer and single-player FPS title is one of the best-looking Battlefields to date.
Battlefield V
Battlefield V Raytracing DLSS (Quality)
Cyberpunk 2077
Cyberpunk 2077 is an action role-playing video game developed by CD Projekt Red and published by CD Projekt. The story takes place in Night City, an open world set in the Cyberpunk universe. Players assume the first-person perspective of a customizable mercenary known as V, who can acquire skills in hacking and machinery with options for melee and ranged combat. The game uses CD Projekt Red's in-house Red Engine which is one of the most visually breathtaking and also one of the most graphics-intensive engines designed to date.
Cyberpunk 2077 (4K Native RT)
Death Stranding
Sam Porter Bridges has delivered one of PS4's most anticipated games to the PC community and opened a whole new world of possibilities. This was the first game to feature the Decima Engine on PC and unarguably did it the best. Death Stranding may not feature ray tracing effects, but it does showcase that DLSS can be used effectively even when RT isn't around. We tested this one just like we did in our launch coverage with DLSS enabled.
Death Stranding DLSS/FSR (Quality)
Forza Horizon 5
Forza Horizon 5 carries on the open-world racing tradition of the Horizon series. The latest DX12-powered entry is beautifully crafted, amazingly well executed, and a great showcase of DX12 games. We use the benchmark run while having all of the settings set to non-dynamic with an uncapped framerate to gather these results.
Forza Horizon 5
Halo Infinite (DX12 Highest)
Next up, we have the latest entry in the Halo franchise, Halo: Infinite, which uses the brand new Slipspace engine (although there are rumors it will be ditched in the future for Unreal Engine) based on the DX12 API. The game rocks some incredible environments for Master Chief to visit on the Halo ring.
Halo Infinite
Hitman III (DX12 Highest Settings)
Hitman III is the highly acclaimed sequel to the 2016 Hitman & 2018 Hitman II, which was a redesign and reimaging of the game from the ground up. With a focus on stealth gameplay through various missions, the game once again lets you play as Agent 47. The game runs on the IO Interactive Glacier 2 engine which has been updated to deliver amazing visuals and environments on each level while making use of DirectX 12 API.
Hitman III
Shadow of The Tomb Raider
The sequel to Rise of the Tomb Raider, Shadow of The Tomb Raider is visually enhanced with an updated Foundation Engine that delivers realistic facial animations and the most gorgeous environments ever seen in a Tomb Raider Game. The game is a technical marvel and really shows the power of its graphics engine in the latest title.
Shadow of The Tomb Raider
Shadow of The Tomb Raider Raytracing DLSS/FSR (Quality)
Metro Exodus
Metro Exodus continues Artyom's journey through Russia's nuclear wasteland and its surroundings. This time, you are set over the Metro, going through various regions and different environments. The game is one of the premier titles to feature NVIDIA’s RTX technology and does well in showcasing the ray-tracing effects in all corners.
Metro Exodus Extreme Preset
Metro Exodus Raytracing DLSS (Quality)
Resident Evil Village
Resident Evil Village is the latest in the horror franchise that was wonderfully rekindled with RE7 and onto the RE2 Remake. But now the RE Engine is back and better than ever with Ray Traced Reflections and Lighting that makes the world just come to life, unironically. The game was tested in the center of the village itself with all graphical settings maxed out and with raytracing enabled.
Resident Evil Village (Maxed)
Resident Evil Village Raytracing FSR (Quality)
Stray (That Cat Game)
Stray is a 2022 adventure game developed by BlueTwelve Studio and published by Annapurna Interactive. The story follows a stray cat who falls into a walled city populated by robots, machines, and mutant bacteria, and sets out to return to the surface with the help of a drone companion, B-12. The game uses Unreal Engine 4, but DX12 Ray tracing can be enabled by adding the "-dx12" extension to the game.
Stray (Maxed With DXR)
No graphics card review is complete without evaluating its temperatures and thermal load. NVIDIA uses an updated vapor chamber and fan design on the brand-new Founders Edition variant that offers a 10% large fan and fin volume while offering up to 15% higher airflow.
Temperatures
I compiled the power consumption results by testing each card under idle and full stress when the card was running games. Each graphics card manufacturer sets a default TDP for the card which can vary from vendor to vendor depending on the extra clocks or board features they plugin on their custom cards. Default TDP for the GeForce RTX 4080 Founders Edition is rated at 320W and the peak power limit is rated at 400W.
Power Consumption
The NVIDIA GeForce RTX 4080 was unveiled with two products, a 16 GB model and a 12 GB model however due to severe backlash and criticism by the consumers, only one variant survived, & that is the one we are testing today, the RTX 4080 (16 GB). The NVIDIA GeForce RTX 4080 graphics card is a definite upgrade over the RTX 3080 and RTX 3080 Ti in terms of raw performance, ray tracing performance and packs more features in the form of DLSS 3, AV1, superb compute performance, and much more. So let's dive in and see how well the MSI custom lineup performed.
An RTX 3080 Ti Successor With A Missing 'Ti'
The NVIDIA GeForce RTX 4080 undoubtedly delivers better performance than an RTX 3090 Ti across all games we tested. The graphics card breaks no sweat at 4K gaming and is going to be impressive in ray tracing and DLSS titles. The performance can range anywhere between 20-40% & ray tracing will further push the bar up. Top DLSS on that and you are getting a performance jump of a lifetime but there's a catch.
You see, if the RTX 3090 Ti and RTX 3090 were still at their MSRP, the RTX 4080 would've been a lot more sense at its $1199 US MSRP. Even as of right now, the card is just $100-$200 US expensive vs a new RTX 3090 Ti and RTX 3090 which is decent value but the problem is how the graphics card is named. In its official charts, NVIDIA compares the RTX 4080 with the RTX 3080 Ti and in terms of its pricing, the card matches the previous-gen RTX 3080 Ti. But the problem is that it isn't named so & despite the missing 'Ti' label, some users won't be able to swallow the $500 US price bump versus what they paid for their RTX 3080 (10 GB) graphics card. Well, no one ever paid the MSRP for the 3080s considering it was never available due to the crypto boom. But this is a different market now that is flooded with used cards and new ones that are available at dirt-cheap rates.
Furthermore, if you look at the RTX 4090 at $1599 US, the flagship suddenly starts looking like a much better value which costs around 35% more but delivers up to 40-50% higher performance, increased memory, and even more compute power to boot. Overall, the RTX 4080 performed great and is the second-best gaming GPU that one can buy right now. The RTX 3080 and 3080 Ti users will benefit from increased performance and frame rates that is if they are willing to drop over $1K US in cash.
That Power Efficiency Tho!
The thing that surprised me the most about the NVIDIA GeForce RTX 4080 graphics card is its power efficiency. During gaming, the graphics card was running at an average power draw of 250-300W. We saw several cases where the power dropped below 250W with DLSS applied and that's impressive because, at the same time, we were also getting up to 35% better performance than an RTX 3090 Ti which consumed 480W of power. Even under Cyberpunk 2077 with Psycho settings, the card never jumped above 300W. That's a 37.5% lower power draw than an RTX 3090 Ti while offering 40% higher performance.
Besides that, the MSI GeForce RTX 4080 SUPRIM X and Gaming X Trio performed splendidly with a temperature that never peaked above 65C at full load while at stock or overclocked. This should be enough for people who believe that Ada is a hot and power-hungry architecture. Its power efficiency was already demonstrated in our 4090 FE review and we get to see it perform even better with the AD103 GPU core.
How much more performance will the custom models get me?
One of the main questions that one would ask is about the net performance upgrade that a custom model will offer over the Founders Edition. Considering that the SUPRIM X and Gaming X Trio are coming in at a premium, one should expect better performance and with a slight increase in clocks and lots of engineering put into the coolers, you can see anywhere from 5-8% higher performance with a custom model but performance is just one part of the equation. There's also the matter of thermals, the power numbers, and the overall overclocking uplift that these cards can deliver & also sustain.
Both cards are able to hit over 3 GHz with ease. The SUPRIM X can hit that number more often due to its lower temps opening up more space but in the end, with its full unlocked power limit, the 4080 SUPRIM X can hit over 3.1 GHz which is a superb clock speed and something that last-gen cards could only achieve on LN2 cooling. If you are not looking to pay the hefty premium that ranges from $100-$200 US over the $1199 US MSRP, you can always find the Gaming X (Non-Trio) which retails at MSRP.
It's Better To Wait Than Getting An RTX 4080 Now
The NVIDIA GeForce RTX 4080 is a good graphics card in terms of performance but its value is questionable at best. While MSI is offering a great package in the form of its SUPRIM X and the Gaming X Trio lineup, offering superior cooling and factory-overclocked specifications, there's a good reason to hold off your purchase for a few weeks. The reason is simply that AMD has their counterattack coming in the form of the Radeon RX 7900 series and users should definitely consider waiting to see how those cards perform against the RTX 4080. As of right now, I'd ask users if they can wait just two weeks, they might have a better groundwork of where their hard-earned cash should go.
Contents
Follow Wccftech on Google to get more of our news coverage in your feeds.