The Wccftech Test Bench
Keeping their tradition alive of launching a new graphics architecture every two years, this year, NVIDIA introduces its Ada Lovelace GPU. The Ada GPU is built upon the foundation set by Turing. NVIDIA terms the Ada Lovelace GPUs as a quantum leap over Ampere, and the GeForce RTX 4080 Founders Edition based on NVIDIA Ampere GPU excels at everything versus the previous gen.
The Ada GPU architecture has a lot to be talked about in this review, but so does the new RTX lineup. The Ada lineup offers faster shader performance, faster ray tracing performance, and faster AI performance. Built on a brand new process node and featuring an architecture designed from the ground up, Ada is a killer product with lots of numbers to talk about.
The fundamental of Ada was to take everything NVIDIA learned with its Turing & Ampere architectures and not only refine it but to use its DNA to form a product in a completely new performance category. Tall claims were made by NVIDIA when they introduced its Ada lineup last month with up to 4x performance claims & we will be finding out whether NVIDIA hit all the ticks with its Ada architecture as this review will be your guiding path to see what makes Ada and how it performs against its predecessors.
Today, we will be taking a look at the NVIDIA GeForce RTX 4080 Founders Edition graphics card. The card was provided by NVIDIA for the sole purpose of this review & we will be taking a look at their technology, design, and performance metrics in detail.
NVIDIA GeForce RTX 40 Series Gaming Graphics Cards - The Biggest GPU Performance Leap in Recent History
Turing wasn't just any graphics core, it was the graphics core that was to become the foundation of future GPUs. The future is realized now with next-generation consoles going deep in talks about ray tracing and AI-assisted super-sampling techniques. NVIDIA had a head start with Turing & Ampere and its Ada generation will only do things infinitely times better.
The Ada GPU does many traditional things which we would expect from a GPU, but at the same time, also breaks the barrier when it comes to untraditional GPU operations. Just to sum up some features:
- New Streaming Multiprocessor (SM)
- New Turing Tensor Cores
- New Real-Time Ray Tracing Acceleration
- New Shading Enhancements
- New Deep Learning Features For Graphics & Inference
- New GDDR6X High-Performance Memory Subsystem
- New HDMI 2.1 Display Engine & Next-Gen NVENC/NVDEC
The technologies mentioned above are some of the main building blocks of the Ada GPU, but there's more within the graphics core itself which we will talk about in detail so let's get started.
Let's take a trip down the journey to Ada. In 2016, NVIDIA announced their Pascal GPUs which would soon be featured in their top to bottom GeForce 10 series lineup. After the launch of Maxwell, NVIDIA gained a lot of experience in the efficiency department which they put a focus on since their Kepler GPUs.
Four years ago, NVIDIA, rather than offering another standard leap in the rasterization performance of its GPUs took a different approach & introduced two key technologies in its Turing line of consumer GPUs, one being AI-assisted acceleration with the Tensor Cores and the second being hardware-level acceleration for Ray Tracing with its brand new RT cores.
Then came Ampere with its brand new Samsung 8nm fabrication process, NVIDIA added even more to its gaming graphics lineup. In the Ampere GPU architecture, NVIDIA provided its latest Ampere SM along with next-gen FP32, INT32, Tensor Cores, and RT cores. The focus was to boost both rasterization and ray tracing capabilities to new heights.
Now enter Ada, a brand new architecture that aims to take everything from the first two RTX GPUs and perfect it. The graphics architecture is designed for speed and that it excels at. So let's see the architecture in detail. Following are the few main highlights of the Ada Lovelace GPU architecture:
- Revolutionary New Architecture: NVIDIA Ada architecture GPUs deliver outstanding performance for graphics, AI, and compute workloads with exceptional architectural and power efficiency. After the baseline design for the Ada SM was established, the chip was scaled up to shatter records. Manufacturing innovations and materials research enabled NVIDIA engineers to craft a GPU with 76.3 billion transistors and 18,432 CUDA Cores capable of running at clocks over 2.5 GHz while maintaining the same 450W TGP as the prior generation flagship GeForce RTX 3090 Ti GPU. The result is the world’s fastest GPU with the power, acoustics, and temperature characteristics expected of a high-end graphics card.
- New Ada RT Core for Faster Ray Tracing: For decades, rendering ray-traced scenes with physically correct lighting in real-time has been considered the holy grail of graphics. At the same time, the geometric complexity of environments and objects continues to increase as 3D games and graphics continually strive to provide the most accurate representations of the real world. The Ada RT Core has been enhanced to deliver 2x faster ray-triangle intersection testing and includes two important new hardware units. An Opacity Micro map Engine speeds up ray tracing of alpha-tested geometry by a factor of 2x, and a Displaced Micro-Mesh Engine generates Displaced Micro-Triangles on-the-fly to create additional geometry. The Micro-Mesh Engine provides the benefit of increased geometric complexity without the traditional performance and storage costs of complex geometries.
- Shader Execution Reordering: NVIDIA Ada GPUs support Shader Execution Reordering which dynamically organizes & reorders shading workloads to improve RT shading Introduction efficiency. This improves performance by up to 44% in Cyberpunk 2077 with Ray Tracing Overdrive Mode.
- NVIDIA DLSS 3: The Ada architecture features an all-new Optical Flow Accelerator and AI frame generation that boosts DLSS 3’s frame rates up to 2x over the previous DLSS 2.0 while maintaining or exceeding native image quality. Compared to traditional brute-force graphics rendering, DLSS 3 is ultimately up to 4x faster while providing low system latency.
The NVIDIA Ada Lovelace AD102 GPU features up to 12 GPC (Graphics Processing Clusters). These are 5 more SMs compared to the Ampere GA102 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don't share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.
So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 144 SM units (12 per GPC), we are looking at a total of 18,432 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM & their own L0 i-cache. This is a 33% increase in Wraps/Threads vs the GA102 GPU. The Register file size is 16,384 across a 32-bit lane. Each SM also carries its own 128 KB of L1 data cache and shared memory so that's 18 MB of L1 cache.
Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 64 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 4 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 112 ROPs for the full-die.
There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD103 GPU will offer (versus the GA103 GPU):
- 16.6% More GPCs (Versus Ampere)
- 40% More Cores (Versus Ampere)
- 50% More L1 Cache (Versus Ampere)
- 16x More L2 Cache (Versus Ampere)
- 16.6% More ROPs (Versus Ampere)
- 4th Gen Tensor & 3rd Gen RT Cores
The full die has not been featured on any GPU so far since the RTX 4080 features a cut-down layout and it is likely that as yields progress, we will eventually see a gaming and workstation product using the full-fat AD103. Till then, the RTX 4080 is the top gaming graphics card based on this GPU.
NVIDIA AD103 'Ada Lovelace' Gaming GPU Block Diagram:
NVIDIA AD103 'Ada Lovelace' Gaming GPU 'SM' Block Diagram:
NVIDIA GeForce RTX 4080
- 49 TFLOPS of peak single-precision (FP32) performance
- 98 TFLOPS of peak half-precision (FP16) performance
- 390 Tensor TFLOPS
- 780 Tensor TFLOPs with sparsity
- 113 RT-TFLOPs
At the heart of the NVIDIA GeForce RTX 4080 graphics card lies the Ada Lovelace AD103 GPU. The GPU measures 608,4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team. The GPU features an insane 45.9 Billion transistors.
NVIDIA Ada GPUs - AD102, AD103, AD104 For The First Wave of Gaming Cards
NVIDIA is first introducing three brand new Ada GPUs which include the AD102, AD103 & AD104. The AD102 GPU is going to be featured on the GeForce RTX 4090, the AD103 is going to be used by the GeForce RTX 4080 16 GB graphics cards and the AD104 GPU is going to be featured on the GeForce RTX 4080 12 GB graphics cards.
The Ada GPUs are based on the TSMC 4N process node which is a custom process designed exclusively for NVIDIA. It is essentially an optimized version of the N5 (5nm) process, offering drastic increases in transistors, cores, and frequency. The top AD103 GPU packs 16% more cores and also offers 45.9 Billion transistors while offering over 2x the performance per watt.
NVIDIA Ada AD103 GPU
The full AD103 GPU is made up of 7 graphics processing clusters with 12 SM units on each cluster. That makes up 84 SM units for a total of 10752 cores, 76 RT cores, 304Tensor Cores, 320 Texture Units, and a 256-bit bus interface in a 45.9 billion transistor package measuring 378.6mm2.
NVIDIA has also introduced its 4th Generation Tensor core architecture and 3rd Generation RT cores on Ada GPUs. Now Tensor cores have been available since Volta and consumers got a taste of it with the Turing & Ampere GPUs. One of the key areas where Tensor Cores are put to use for AAA games is DLSS. There's a whole software stack that leverages from Tensor cores and that is known as the NVIDIA NGX. These software-based technologies will help enhance graphics fidelity with features such as Deep Learning Super Sampling (DLSS), AI InPainting, AI Super Rez, RTX Voice, and AI Slow-Mo.
While its initial debut was a bit flawed, DLSS in its 2nd iteration (DLSS 2.x) has done wonders to not only improve gaming performance but also image quality.
Let's dive into the technological advancements that allow these incredible achievements. To begin with, NVIDIA engineers started with DLSS Super Resolution and added something called Optical Multi Frame Generation based on Ada's Optical Flow Accelerator.
This accelerator analyzes two sequential frames from a particular game, capturing pixel details such as particles, reflections, lighting, and shadows.
On top of that, NVIDIA DLSS 3 also takes into account conventional game engine information such as motion vectors. The DLSS Frame Generation AI convolutional autoencoder network will then decide how to use each of the four inputs (current and prior frames, optical flow field, and motion vectors) to recreate intermediate frames in the best possible way.
NVIDIA DLSS 3 is said to reconstruct 3/4 of the first frame with DLSS Super Resolution and the full second frame with the help of the aforementioned DLSS Frame Generation. Overall, NVIDIA DLSS 3 reconstructs 7/8 of the two total frames displayed, which explains the massive performance uplift.
Additionally, the new version of the Deep Learning Super Sampling image reconstruction technique also includes the latency-lowering NVIDIA Reflex technology.
Cyberpunk 2077 has been shown running NVIDIA DLSS 3, the brand new Ray Tracing Overdrive, and NVIDIA Reflex with up to 4x improved performance and up to 2x reduced latency. That's not all, as NVIDIA is even promising benefits for CPU-bound games, which generally didn't run much faster with DLSS 2.0. For example, the notoriously CPU-heavy Microsoft Flight Simulator gets up to 2x improved performance with the new DLSS.
Overall, NVIDIA said the following over 35 games and apps already pledged support to NVIDIA DLSS 3.
|
|
The green company also released a performance chart on some of those games running on NVIDIA DLSS 3; check it out below.
3rd Gen RT Cores, RTX, and Real-Time Ray Tracing Dissected
Next up, we have the RT Cores, which are what will power Real-Time Raytracing. NVIDIA isn't going to distance itself from traditional rasterization-based rendering but instead follow a hybrid rendering model. The new 3rd Generation RT cores offer increased performance and offer double the ray/triangle intersection testing rate over Turing RT cores.
the Third-Generation RT Core found in Ada GPUs includes dedicated units known as the Opacity Micromap Engine and the Displaced Micro-Mesh Engine. The Opacity Micromap Engine evaluates Opacity Micromaps (represented by the triangle with foliage on the bottom left), which are used to accelerate alpha traversal. The Displaced Micro-Mesh Engine generates meshes of micro-triangles that are known as Displaced Micro-Meshes (represented by the triangle on the bottom right in the diagram below). Displaced Micro-Meshes allow the Ada RT Core to ray trace geometrically complex objects and environments with significantly less BVH build time and storage costs. Finally, ray-triangle intersection testing is 2x faster in Ada’s Third-Generation RT Core compared to the Ampere GPU generation.
NVIDIA engineers have developed three new features in the Ada RT Core to enable high-performance ray tracing of highly complex geometry:
- First, Ada’s Third-Generation RT Core features 2x Faster Ray-Triangle Intersection Throughput relative to Ampere; this enables developers to add more detail to their virtual worlds.
- Second, Ada’s RT Core has 2x Faster Alpha Traversal; the RT Core features a new Opacity Micromap Engine to directly alpha-test geometry and significantly reduce shader-based alpha computations. With this new functionality, developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and directly and more efficiently ray trace them with the Ada RT Core.
- Third, the new Ada RT Core supports 10x Faster BVH Build in 20X Less BVH Space when using its new Displaced Micro-Mesh Engine to generate micro-triangles from micro-meshes on-demand. The micro-mesh is a new primitive that represents a structured mesh of micro-triangles that the Ada RT Core processes natively, saving the storage and processing compared to what is normally required when describing complex geometries using only basic triangles.
Taken together, these three advances incorporated into the Ada RT Core enable order-of-magnitude increases in richness without commensurate increases in processing time or memory consumption.
2x Faster Ray-Triangle Intersection Testing
Ray-triangle intersection testing is a computationally expensive operation that is commonly performed when rendering a ray-traced scene. Recognizing the importance of this function, with each new RTX GPU NVIDIA engineers have strived to improve intersection testing performance and efficiency. The Third-Generation RT Core in the Ada architecture provides double the throughput for ray-triangle intersection testing over Ampere (and 4x faster than the first-generation RT Core used in Turing GPUs).
2x Faster Alpha Traversal Performance with Opacity Micromap Engine
Developers frequently use a texture’s alpha channel to economically cut out complex shapes or more generally to represent translucency. A leaf might be described using a couple of triangles, employing a texture’s alpha channel to economically capture the complex shape. A flame’s complex shape and translucency can also be approximated by alpha.
Prior to Ada’s RT Core, a developer could incorporate these kinds of content into a ray-traced scene by tagging them as not opaque. When a leaf is hit by a ray, a shader is invoked to determine how to treat the intersection, even if the ray is simply characterized as a hit or a miss. This incurs a noticeable cost. Specifically, when a warp of rays is cast towards non-opaque objects, individual ray queries may require multiple shader invocations to resolve, while other rays terminate immediately. The result is lingering live threads and commensurate inefficiency.
To efficiently handle these kinds of content, NVIDIA engineers have added an Opacity Micromap Engine to Ada’s RT Core. An opacity micromap is a virtual mesh of micro-triangles, each with an opacity state that the RT Core uses to directly resolve ray intersections with non-opaque triangles. Specifically, the barycentric coordinates of an intersection are used to address the corresponding micro-triangle’s opacity state. The opacity state may be opaque, transparent, or unknown. If opaque, then a hit is recorded and returned. If transparent, the intersection is ignored and the search for an intersection continues. If unknown, then the control is returned to the SM, invoking a shader (“anyhit”) to programmatically resolve the intersection.
The new Opacity Micromap Engine evaluates the opacity mask, which is a regular triangular mesh defined using the barycentric coordinate system used for reporting ray/triangle intersections. These meshes may be sized from one to sixteen million micro-triangles, with one or two bits associated with each micro-triangle. As a simple illustrative example, consider a detailed maple leaf described using two triangles and an alpha texture
10x Faster BVH Build in 20X Less BVH Space with Ada’s Displaced Micro-Mesh Engine
Geometric complexity continues to rise with every new generation. Ray tracing performance scales attractively with increases in scene complexity. When we ray trace complex environments, tracing costs increase slowly, a one-hundred-fold increase in geometry might only double tracing time.
However, creating the data structure (BVH) that makes that small increase in time possible requires roughly linear time and memory; 100x more geometry could mean 100x more BVH build time and 100x more memory. Ada’s Third-Generation RT Core with Displaced Micro-Meshes (DMM) helps significantly with both of the challenges of high geometric complexity - BVH build performance and memory/storage footprint. Asset storage and transmission costs are reduced as well.
Secondary rays are generated at each primary ray hit point in the middle scene. Starting at the primary hit surfaces they shoot off in different directions, hitting different objects. Secondary hit shading tends to be less ordered and less efficient when executing on the GPU, because different shader programs are running on different threads, and often must serialize execution. Examples of secondary rays that can benefit from SER include those used for path tracing, reflections, indirect lighting, and translucency effects.
Shader Execution Reordering adds a new stage in the ray tracing pipeline which reorders and groups the secondary hit shading to have better execution locality, thus much higher overall ray-traced shading efficiency. SER can often provide up to 2X performance improvement for RT shaders in cases with a high level of divergence (such as path tracing). In testing with Cyberpunk 2077 running in RT: Overdrive Mode, we’ve measured overall performance gains of up to 44% from SER.
The Micron GDDR6X memory brings a lot of new stuff to the table. It is faster, doubles the I/O data rate, and is the first to implement PAM4 multi-level signaling in memory dies. With the Geforce RTX 3090 class products, Micron's GDDR6X memory achieves a bandwidth of up to 1 TB/s which is used to power next-generation gaming experiences at high-fidelity resolutions such as 8K.
Micron GDDR6X graphics memory doubles input/output (I/O) performance while minimizing the cost of memory. Working with AI-innovation leader NVIDIA, Micron delivers higher bandwidth by enabling multi-level signaling in the form of four-level pulse amplitude modulation (PAM4) technology in this memory device via Micron
The new GDDR6X SGRAM:
- Doubles the data rate of SGRAM at a lower power per transaction while enabling the breaking of the 1 Terabyte per second (TB/s) system memory bandwidth boundary for graphics card applications;
- Is the first discrete graphics memory device that employs PAM4-encoded signaling between the processor and the DRAM, using four voltage levels to encode and transfer two bits of data per interface clock.
- Can be designed and operated stably at high speeds and built-in mass-production.
As mentioned, GDDR6X features the brand new PAM4 multilevel signaling techniques, which help transfer data much faster, double the I/O rate, pushing the capability of each memory dies from 64 GB/s to 84 GB/s. The Micron GDDR6X memory dies are also the only graphics DRAM that can be mass-produced while featuring PAM4 signaling.
What is interesting is that Micron quotes that its GDDR6X memory can hit speeds of up to 22.4 Gbps whereas we have only got to see 21 Gbps in action on the GeForce RTX 3090 Ti. It is likely that AIBs could utilize higher binned dies as they are available. Micron does has faster chips but those aren't coming to NV 40 series graphics cards for now.
It's not just faster speeds but Micron's GDDR6X provides higher bandwidth while sipping in 15% lower power per transferred bit compared to the previous generation GDDR6 memory. PAM4 signaling is a big upgrade from the two-level NRZ signaling on the GDDR6 memory.
Instead of transmitting two binary bits of data each clock cycle (one bit on the rising edge and one bit on the falling edge of the clock), PAM4 sends two bits on each clock edge, encoded using four different voltage levels. The voltage levels are divided into 250 mV steps with each level representing two bits of data - 00, 01, 10, or 11 sent on each clock edge (still DDR technology).
Micron GDDR6X Memory
| Feature | GDDR5 | GDDR5X | GDDR6 | GDDR6X |
|---|---|---|---|---|
| Density | From 512Mb to 8Gb | 8Gb | 8Gb, 16Gb | 8Gb, 16Gb |
| VDD and VDDQ | Either 1.5V or 1.35V | 1.35V | Either 1.35V or 1.25V | Either 1.35V or 1.25V |
| VPP | N/A | 1.8V | 1.8V | 1.8V |
| Data rates | Up to 8 Gb/s | Up to 12Gb/s | Up to 16 Gb/s | 19 Gb/s, 21 Gb/s, >21 Gb/s |
| Channel count | 1 | 1 | 2 | 2 |
| Access granularity | 32 bytes | 64 bytes 2x 32 bytes in pseudo 32B mode | 2 ch x 32 bytes | 2 ch x 32 bytes |
| Burst length | 8 | 16 / 8 | 16 | 8 in PAM4 mode 16 in RDQS mode |
| Signaling | POD15/POD135 | POD135 | POD135/POD125 | PAM4 POD135/POD125 |
| Package | BGA-170 14mm x 12mm 0.8mm ball pitch | BGA-190 14mm x 12mm 0.65mm ball pitch | BGA-180 14mm x 12mm 0.75mm ball pitch | BGA-180 14mm x 12mm 0.75mm ball pitch |
| I/O width | x32/x16 | x32/x16 | 2 ch x16/x8 | 2 ch x16/x8 |
| Signal count | 61 - 40 DQ, DBI, EDC - 15 CA - 6 CK, WCK | 61 - 40 DQ, DBI, EDC - 15 CA - 6 CK, WCK | 70 or 74 - 40 DQ, DBI, EDC - 24 CA - 6 or 10 CK, WCK | 70 or 74 - 40 DQ, DBI, EDC - 24 CA - 6 or 10 CK, WCK |
| PLL, DCC | PLL | PLL | PLL, DCC | DCC |
| CRC | CRC-8 | CRC-8 | 2x CRC-8 | 2x CRC-8 |
| VREFD | External or internal per 2 bytes | Internal per byte | Internal per pin | Internal per pin 3 sub-receivers per pin |
| Equalization | N/A | RX/TX | RX/TX | RX/TX |
| VREFC | External | External or Internal | External or Internal | External or Internal |
| Self refresh (SRF) | Yes Temp. Controlled SRF | Yes Temp. Controlled SRF Hibernate SRF | Yes Temp. Controlled SRF Hibernate SRF VDDQ-off | Yes Temp. Controlled SRF Hibernate SRF VDDQ-off |
| Scan | SEN | IEEE 1149.1 (JTAG) | IEEE 1149.1 (JTAG) | IEEE 1149.1 (JTAG) |
With each new generation of graphics cards, NVIDIA delivers a new range of display technologies. This generation is no different, and we see some significant updates to the display engine and the graphics interconnect. With the adoption of faster GDDR6X memory, which provides higher bandwidth, faster compression, and more cache, gaming applications can now run at higher resolutions, supporting more details on the display.
The Ada Display Engine supports two new display technologies, HDMI 2.1 and DisplayPort 1.4a with DSC 1.2a. HDMI 2.1 allows up to 48 Gbps of total bandwidth and up to 4K 240Hz HDR and 8K 60Hz HDR.
DisplayPort 1.4a allows for up to 8K resolutions with 60Hz refresh rates and includes VESA's display stream compression 1.2 technology with visually lossless compression. You can run up to two 8K displays at 60 Hz using two cables, one for each display. In addition to that, Ampere also supports HDR processing natively with tone mapping added to the HDR pipeline.
Ada GPUs take streaming and video content to the next level, incorporating support for AV1 video encoding in the Ada eighth-generation dedicated hardware encoder (known as NVENC). Prior generation Ampere GPUs supported AV1 decoding but not encoding. Ada’s AV1 encoder is 40% more efficient than the H.264 encoder used in GeForce RTX 30 Series GPUs. AV1 will enable users who are streaming at 1080p today to increase their stream resolution to 1440p while running at the same bitrate and quality, or for users with 1080p displays, streams will look similar to 1440p, providing better quality.
Ada GPUs are also equipped with dual NVENC encoders. This enables video encoding at 8K/60 for professional video editing or four 4K/60. (Game streaming services can also take advantage of this to enable more simultaneous sessions, for instance.) Blackmagic Design’s DaVinci Resolve, the popular Voukoder plugin for Adobe Premiere Pro, and Jianying — the top video editing app in China — are all enabling AV1 support, as well as a dual encoder through encode presets. Dual encoder and AV1 availability for these apps will be available in October. NVIDIA is also working with the popular video-effects app Notch to enable AV1, as well as Topaz to enable support for AV1 and the dual encoders.
In addition to NVENC, Ada GPUs also include the fifth-generation hardware decoder that was first launched with Ampere (known as NVDEC). NVDEC supports hardware-accelerated video decoding of MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and the AV1 video formats. 8K/60 decoding is also fully supported. In the future, NVIDIA is also working to enable high-quality video production using AI.
NVIDIA RTX IO - Blazing Fast Read Speeds With GPU Utilization
As storage sizes have grown, so has storage performance. Gamers are increasingly turning to SSDs to reduce game load times: while hard drives are limited to 50-100 MB/sec throughput, the latest M.2 PCIe Gen4 SSDs deliver up to 7 GB/sec. With the traditional storage model, game data is read from the hard disk, then passed from the system memory and CPU before being passed to the GPU.
Historically games have read files from the hard disk, using the CPU to decompress the game image. Developers have used lossless compression to reduce install sizes and improve I/O performance. However, as storage performance has increased, traditional file systems and storage APIs have become a bottleneck. For example, decompressing game data from a 100 MB/sec hard drive takes only a few CPU cores, but decompressing data from a 7 GB/sec PCIe Gen4 SSD can consume more than twenty AMD Ryzen Threadripper 3960X CPU cores!
Using the traditional storage model, game decompression can consume all 24 cores on a Threadripper CPU. Modern game engines have exceeded the capability of traditional storage APIs. A new generation of I/O architecture is needed. Data transfer rates are the gray bars, CPU cores required are the black/blue blocks.
NVIDIA RTX IO is a suite of technologies that enable rapid GPU-based loading and decompression of game assets, accelerating I/O performance by up to 100x compared to hard drives and traditional storage APIs. When used with Microsoft’s new DirectStorage for Windows API, RTX IO offloads dozens of CPU cores’ worth of work to your RTX GPU, improving frame rates, enabling near-instantaneous game loading, and opening the door to a new era of large, incredibly detailed open-world games.
Object pop-in and stutter can be reduced, and high-quality textures can be streamed at incredible rates, so even if you’re speeding through a world, everything runs and looks great. In addition, with lossless compression, game download and install sizes can be reduced, allowing gamers to store more games on their SSD while also improving their performance.
NVIDIA RTX IO plugs into Microsoft’s upcoming DirectStorage API, which is a next-generation storage architecture designed specifically for state-of-the-art NVMe SSD-equipped gaming PCs and the complex workloads that modern games require. Together, streamlined and parallelized APIs specifically tailored for games allow dramatically reduced IO overhead and maximize performance/bandwidth from NVMe SSDs to your RTX IO-enabled GPU.
Specifically, NVIDIA RTX IO brings GPU-based lossless decompression, allowing reads through DirectStorage to remain compressed and delivered to the GPU for decompression. This removes the load from the CPU, moving the data from storage to the GPU in a more efficient, compressed form, and improving I/O performance by a factor of two.
GeForce RTX GPUs will deliver decompression performance beyond the limits of even Gen4 SSDs, offloading potentially dozens of CPU cores’ worth of work to ensure maximum overall system performance for next-generation games. Lossless decompression is implemented with high-performance compute kernels, asynchronously scheduled. This functionality leverages the DMA and copy engines of Turing and Ampere, as well as the advanced instruction set, and architecture of these GPU’s SM’s.
The advantage of this is that the enormous compute power of the GPU can be leveraged for burst or bulk loading (at level load, for example) when GPU resources can be leveraged as high-performance I/O processors, delivering decompression performance well beyond the limits of Gen4 NVMe. During streaming scenarios, bandwidths are a tiny fraction of the GPU capability, further leveraging the advanced asynchronous compute capabilities of Turing and Ampere. Microsoft is targeting a developer preview of DirectStorage for Windows for game developers next year, and NVIDIA Turing & Ampere gamers will be able to take advantage of RTX IO-enhanced games as soon as they become available.
The NVIDIA GeForce RTX 4080 will use 76 SMs of the 84 SMs for a total of 9728 CUDA cores. The GPU will come packed with 64 MB of L2 cache and a total of 112 ROPs which is simply insane. The clock speeds for the graphics card are rated at 2210 MHz base and 2510 MHz boost clocks and we have already seen over 3 GHz speeds with overclocking which you can read more about here.
As for memory specs, the GeForce RTX 4080 features 16 GB GDDR6X capacities that will be adjusted at 22.5 Gbps speeds across a 256-bit bus interface. This will provide up to 720 GB/s of bandwidth. This is still a tad bit slower than the 760 GB/s bandwidth offered by the RTX 3080 since it comes with a 320-bit interface but a lowly 10 GB capacity. To compensate for the lower bandwidth, NVIDIA could be integrating a next-gen memory compression suite to make up for the 256-bit interface.
- NVIDIA GeForce RTX 4080 16 GB "Official" TBP - 320W
- NVIDIA GeForce RTX 3080 12 GB "Official" TBP - 350W
As far as the power consumption is concerned, the TBP is rated at 320W. The card will be powered by a single 16-pin connector which delivers up to 600W of power. Custom models will be offering higher TBP targets.
NVIDIA GeForce RTX 4080 Graphics Cards Performance
As for the performance of these monster GPUs, NVIDIA shared the computational and gaming performance figures and it looks like the GeForce RTX 4080 will be sitting slightly ahead of the GeForce RTX 3090 Ti with around 50 TFLOPs of Compute power.
Just for comparison's sake:
- NVIDIA GeForce RTX 4090: 83 TFLOPs (FP32) (2.5 GHz Boost clock)
- NVIDIA GeForce RTX 4080: 49 TFLOPs (FP32) (2.5 GHz Boost clock)
- NVIDIA GeForce RTX 3090 Ti: 49 TFLOPs (FP32) (1.86 GHz Boost clock)
- NVIDIA GeForce RTX 3090: 36 TFLOPs (FP32) (1.69 GHz Boost clock)
Based on a boost clock speed of 2.5 GHz, you get up to 49TFLOPs of compute performance and you can definitely squeeze out a lot more with an overclock as we had demonstrated with the RTX 4090. One should remember that compute performance doesn't necessarily indicate the overall gaming performance. Even so, it will be a huge upgrade for gaming PCs and an 8.5x increase over the current fastest console, the Xbox Series X.
FP32 Compute Horsepower Comparisons (Higher is Better)
This will be a 2x compute performance uplift and a 2x gain in gaming performance as NVIDIA has demonstrated for each graphics card versus its predecessor and this is without even factoring in the RT and Tensor core performance which are expected to get major lifts too in their respective department. A 2-4x gain over the RTX 3090 & RTX 3090 Ti would be very disruptive.
Gamers should expect 4K gaming to be buttery smooth on these graphics cards and with DLSS, we might even see playable 60 FPS at 8K resolution which is something that NVIDIA has been trying to achieve with its RTX 3090 series BFGPUs for a while now.
NVIDIA GeForce RTX 4080 Graphics Cards Price & Availability
The NVIDIA GeForce RTX 4080 16 GB graphics card will be available starting tomorrow for a price of $1199 US. The card will be available in both Founders Edition and custom graphics card flavors at launch.
NVIDIA GeForce RTX 40 Series Official Specs:
| Graphics Card Name | NVIDIA GeForce RTX 4090 | NVIDIA GeForce RTX 4090 D | NVIDIA GeForce RTX 4080 | NVIDIA GeForce RTX 4070 Ti | NVIDIA GeForce RTX 4070 | NVIDIA GeForce RTX 4060 Ti | NVIDIA GeForce RTX 4060 |
|---|---|---|---|---|---|---|---|
| GPU Name | Ada Lovelace AD102-300 | Ada Lovelace AD102-250 | Ada Lovelace AD103-300 | Ada Lovelace AD104-400 | Ada Lovelace AD104-250 | Ada Lovelace AD106-350 | Ada Lovelace AD107-400 |
| Process Node | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N | TSMC 4N |
| Die Size | 608mm2 | 608mm2 | 378.6mm2 | 294.5mm2 | 294.5mm2 | 190.0mm2 | 146.0mm2 |
| Transistors | 76 Billion | 76 Billion | 45.9 Billion | 35.8 Billion | 35.8 Billion | 22.9 Billion | TBD |
| CUDA Cores | 16384 | 14592 | 9728 | 7680 | 5888 | 4352 | 3072 |
| TMUs / ROPs | 512 / 176 | TBD | 320 / 112 | 240 / 80 | 184 / 64 | 136 / 48 | TBD |
| Tensor / RT Cores | 512 / 128 | 456 / 128 | 304 / 76 | 240 / 60 | 184 / 46 | 136 / 34 | TBD |
| L2 Cache | 72 MB | 72 MB | 64 MB | 48 MB | 36 MB | 32 MB | 24 MB |
| Base Clock | 2230 MHz | 2280 MHz | 2210 MHz | 2310 MHz | 1920 MHz | 2310 MHz | 1830 MHz |
| Boost Clock | 2520 MHz | 2520 MHz | 2510 MHz | 2610 MHz | 2475 MHz | 2535 MHz | 2460 MHz |
| FP32 Compute | 83 TFLOPs | TBD | 49 TFLOPs | 40 TFLOPs | 29 TFLOPs | 22 TFLOPs | 15 TFLOPs |
| RT TFLOPs | 191 TFLOPs | TBD | 113 TFLOPs | 82 TFLOPs | 67 TFLOPs | 51 TFLOPs | 35 TFLOPs |
| Tensor-TOPs | 1321 TOPs | TBD | 780 TOPs | 641 TOPs | 466 TOPs | 353 TOPs | 242 TOPs |
| Memory Capacity | 24 GB GDDR6X | 24 GB GDDR6X | 16 GB GDDR6X | 12 GB GDDR6X | 12 GB GDDR6X | 8-16 GB GDDR6 | 8 GB GDDR6 |
| Memory Bus | 384-bit | 384-bit | 256-bit | 192-bit | 192-bit | 128-bit | 128-bit |
| Memory Speed | 21.0 Gbps | 21.0 Gbps | 23.0 Gbps | 21.0 Gbps | 21.0 Gbps | 18.0 Gbps | 17.0 Gbps |
| Bandwidth | 1008 GB/s | 1008 GB/s | 736 GB/s | 504 GB/s | 504 GB/s | 288 GB/s (554 GB/s Effective) | 272 GB/s (453 GB/s Effective) |
| TBP | 450W | 425W | 320W | 285W | 200W | 160-165W | 115W |
| Price (MSRP / FE) | $1599 US / 1949 EU | 12,999 RMB (China-Only) | $1199 US / 1469 EU | $799 US | $599 US | $399-$499 US | $299 US |
| Price (Current) | $1599 US / 1859 EU | 12,999 RMB (China-Only) | $1199 US / 1399 EU | $799 US | $599 US | $399-$499 US | $299 US |
| Launch (Availability) | 12th October 2022 | 28th December 2023 | 16th November 2022 | 5th January 2023 | 13th April 2023 | 24th May / 18th July 2023 | 29th June 2023 |
NVIDIA Founders Edition is Designed To Utilize Up To 600W of Power For Higher Overclocking
As for its brand new Founders Edition cards, the GeForce RTX 4090 24 GB and RTX 4080 16 GB, NVIDIA has produced a compact PCB similar to the ones we saw on the previous generation & designing a PCB like this helps improve airflow and cooling performance.
NVIDIA says that they have further optimized the Dual Axial Flow Through system, increasing fan sizes and fin volume by 10%, offering 20% higher airflow and upgrading to a 23-phase power supply (20+3 Phase for RTX 4090). Memory temperatures are reduced, and the new, substantially more powerful Ada GPUs are kept cool in ventilated cases, giving gamers excellent overclocking headroom. NVIDIA went through a rigorous testing procedure and is said to have evaluated as many as 50 fan designs before finalizing the one we are getting on the new cards. The cooler is used to dissipate heat from the heatsink assembly that comprises a vapor chamber, a big jump from the previous design too.
The NVIDIA GeForce RTX 4080 also uses the same cooler as the RTX 4090 Founders Edition and since it has a lower TDP, it should deliver even better thermal performance.
Each GeForce RTX 40 Series Founders Edition graphics card reduces cable clutter by leveraging the new standard GPU power input of next-gen ATX 3.0 power supplies, the PCIe Gen-5 16-pin Connector. This enables you to power GeForce RTX 40 Series graphics cards with just a single cable, improving the aesthetics of your build. If you are using a previous-gen power supply, an adapter cable is included in the box, allowing you to plug in three 8-pin power connectors, with an optional fourth connector for more overclocking headroom. ATX 3.0 power supplies will be available in October from ASUS, Cooler Master, FSP, Gigabyte, iBuyPower, MSI, and ThermalTake, with more models to come.
One advantage of the new 16-pin connector is that while the Founders Edition cards are designed at 450W & 320W, respectively, they can utilize the extra headroom provided through the new connector for extreme overclocking with the RTX 4090 going for that full 600W mark. The new power delivery also gives the RTX 40 series a 10x increase in response time to power transient management compared to the previous generation.
The new cards also feature DP 1.4a (4K 12-bit HDR @ 240Hz) and HDMI 2.1 (4K 120Hz HDR / 8K 60Hz HDR). All cards are compliant with the PCIe Gen 4 interface on existing motherboards and also feature full compliance with the Resizable-BAR technologies.
NVIDIA GeForce RTX 4080 Founders Edition PCB:
Next-Gen Micron GDDR6X Dies Run 10C Cooler Thanks To New Process Node
NVIDIA has also leveraged Micron's latest GDDR6X memory chips for its GeForce RTX 40 graphics cards which run 10C cooler, are more power efficient and since they are all 16Gb DRAM dies, they can be fused on one side of the PCB to be cooled better than dual-sided memory.
So let's get started by unboxing this behemoth of a graphics card and begin by taking a look at the packing first.
The NVIDIA GeForce RTX 4080 Founders Edition comes in a large box that weighs around 6Kg and has a rectangular shape. The whole box features a matte black color with the NVIDIA logo on the top left corner and the GeForce RTX 4080 logo below it.
If you flip the box vertically, you will see that it resembles an Xbox Series X console and it is also the same height. The side features an outline of the Founders Edition cooler.
The box will come as a standard with all NVIDIA RTX 4090 and RTX 4080 16 GB Founders Edition cards.
The top and bottom of the box are two separate compartments. The top opens up and the end result looks like a rectangle.
The NVIDIA GeForce RTX 4080 Founders Edition graphics card rests at the center of the packaging & you can see the creative take from NVIDIA in designing this package.
Once the box is open, you finally get to lay your eyes on the NVIDIA GeForce RTX 4080 graphics card which looks as spectacular as ever.
The card may look like it is the same design as the RTX 3090 Ti Founders Edition but it is a slightly updated version which we will explain in a bit.
A cover that exposes a lid sits underneath the card and can be easily pulled to reveal another package.
This package contains a few manuals and also one of the most important accessories that NVIDIA ships with its Founders Edition card.
if you guessed the 16-pin (12VHPWR) connector, then you guessed right. This is an NVIDIA-branded adapter and comes with a single 16-pin to three 8-pin connectors. This is rated to provide up to 450 Watts of power to the chip.
Following is what the 16-pin connector looks like. You may have already seen and heard about it in several articles from us, but you can notice that there are 12 standard pins and four smaller pins on the connector.
Out of the box, we can finally start taking a much better look at the Ada Lovelace powerhouse.
NVIDIA GeForce RTX 4080 "Ada Lovelace" Founders Edition Graphics Card Close Up
The NVIDIA GeForce RTX 4080 Founders Edition is a true BFGPU. A big chunky card that will take a good amount of room within your PC.
The card comes in a triple-slot design and you can see the several exhaust vents that are there to push air out of the chassis.
On the bottom of the shroud, you can see three plastic panels that cover the main heatsink and the largest one features the "RTX 4080" branding on it.
The back of the card is encircled by a large die-cast aluminum piece, forming an "X" shape in the center.
The card features a dual-axial flow-through design. This design incorporates two fans placed on different sides of the shroud (one front and one back) and perpendicular to one another.
The fan at the bottom pushes air out of the aluminum fins on the backplate.
As you can see, the card is super thick in design, and all of this thickness is there to hold the massive aluminum fins and heat pipes running through the shroud.
You can find a nice "GeForce RTX" logo on the card which features LEDs. A similar LED can also be found within the shroud on the back.
The card comes with a single 16-pin power connector that uses the aforementioned 12VHPWR plug that's bundled with the card.
The new Founders Edition cooler comes with 10% larger fan sizes and 10% larger fin volume. This is all to help the card run super cool and also super quiet.
Once again, you can lay your eyes on the RTX 4080 logo which comes with a new font style. This will be applicable across all RTX 40 series cards.
NVIDIA has taken away some of that aluminum frame room and cut out the corners to make space for the larger fans.
The front side of the card is an aluminum heatsink. These large heatsink blocks show that there's some serious cooling involved to keep the card running.
There's a nice little "RTX 4080" logo carved out on one of the four aluminum arms on the front of the shroud.
Lastly, we can just tell you that the card feels very premium and very awesome when running on the PC.
We used the following test system for comparison between the different graphics cards. The latest drivers that were available at the time of testing were used by AMD and NVIDIA on an updated version of Windows 11. All tested games were patched to the latest version for better performance optimization for NVIDIA and AMD GPUs.
NVIDIA GeForce RTX 4090 Test Setup
| CPU | Intel Core i9-12900K @ 5.0 GHz |
|---|---|
| Motherboard | AORUS Z690 Master (DDR5) |
| Video Cards | Colorful GeForce RTX 4090 Vulcan OC-V MSI GeForce RTX 4090 SUPRIM Liquid X MSI GeForce RTX 4090 SUPRIM X NVIDIA GeForce RTX 4090 FE NVIDIA GeForce RTX 3090 FE NVIDIA GeForce RTX 3080 Ti FE NVIDIA GeForce RTX 3080 FE MSI Radeon RX 6950 XT Gaming X Trio MSI GeForce RTX 3090 Ti SUPRIM X MSI GeForce RTX 3090 SUPRIM X, MSI Radeon RX 6900 XT Gaming Z Trio MSI GeForce RTX 3080 Ti SUPRIM X MSI Radeon RX 6800 XT Gaming X Trio MSI GeForce RTX 3080 SUPRIM X MSI GeForce RTX 3070 Ti SUPRIM X MSI GeForce RTX 2080 Ti Lightning MSI GeForce RTX 3070 Gaming X Trio |
| Memory | G.SKILL Trident Z5 RGB Series 32GB (2 X 16GB) CL36 6000 MHz |
| Storage | Teamgroup T-Force A440 Pro 2 TB Gen 4 |
| Power Supply | ASUS ROG THOR 1200W PSU |
| OS | Windows 11 64-bit |
| Drivers | AMD Radeon Adrenalin Edition 22.9.2 NVIDIA GeForce 521.90 WHQL |
- All games were tested at 3840x2160 (4K) resolution.
- Image Quality and graphics configurations are provided with each game description.
- The "reference" cards are the stock configs except where mentioned otherwise.
Firestrike
Firestrike is running the DX11 API and is still a good measure of GPU scaling performance. In this test, we ran the Extreme and Ultra versions of Firestrike which runs at 1440p and 4K and we recorded the Graphics Score only since the Physics and combined are not pertinent to this review.
3DMark Firestrike Extreme Graphics
3DMark Firestrike Ultra Graphics
Time Spy
Time Spy is running the DX12 API and we used it in the same manner as Firestrike Extreme where we only recorded the Graphics Score as the Physics score is recording the CPU performance and isn't important to the testing we are doing here.
3DMark Time Spy Graphics
3DMark Time Spy Extreme Graphics
Port Royal
Port Royal is another great tool in the 3DMark suite, but this one is 100% targeting Ray Tracing performance. It loads up ray-traced shadows, reflections, and global illumination to really tax the performance of the graphics cards that either has hardware-based or software-based ray-tracing support.
3DMark Port Royal Score
3DMark Pure Ray Tracing Feature Test
Crysis Remastered (DXVK RT)
Crysis is back with a vengeance to reclaim its title of the graphics crown. The remastered version of the game uses DX11 API but has Vulkan extensions on top which enable Vulkan Ray tracing. That's also something that the original game didn't offer. DXVK, along with improved textures and visual effects, leads to higher performance demand making us question once again "Can It Run Crysis?"
Crysis Remastered (4K Native RT SMAA2TX)
Doom Eternal
DOOM Eternal brings hell to earth with the Vulkan-powered idTech 7. We test this game using the Ultra Nightmare Preset and follow our in-game benchmarking to stay as consistent as possible.
DOOM Eternal
Red Dead Redemption 2
Developed by Rockstar San Diego, Red Dead Redemption 2 is one of the most visually stunning open-world games I've played to date that is backed up by a rich story set around the protagonist, Arthur Morgan. The game is based on the RAGE engine which features an insane amount of graphics fidelity but also requires a lot of power to run maxed out. For the purpose of this test, we set the graphics settings to Ultra with AA turned disabled.
Red Dead Redemption 2
Wolfenstein: Youngblood
Wolfenstein is back in The New Colossus and features the most fast-paced, gory, and brutal FPS action ever! The game once again puts us back in the Nazi-controlled world as BJ Blazkowicz. Set during an alternate future where Nazis won the World War, the game shows that it can be fun and can be brutal to the player and to the enemy too. Powering the new title is, once again, id Tech 6 which is much acclaimed after the success that DOOM has become. In a way, ID has regained its glorious FPS roots and is slaying with every new title.
Wolfenstein
Battlefield V
Battlefield V brings back the action of the World War 2 shooter genre. Using the latest Frostbite tech, the game does a good job of looking gorgeous in all ways possible. From the open-world environments to the intense and gun-blazing action, this multiplayer and single-player FPS title is one of the best-looking Battlefields to date.
Battlefield V
Battlefield V Raytracing DLSS (Quality)
Cyberpunk 2077
Cyberpunk 2077 is an action role-playing video game developed by CD Projekt Red and published by CD Projekt. The story takes place in Night City, an open world set in the Cyberpunk universe. Players assume the first-person perspective of a customizable mercenary known as V, who can acquire skills in hacking and machinery with options for melee and ranged combat. The game uses CD Projekt Red's in-house Red Engine which is one of the most visually breathtaking and also one of the most graphics-intensive engines designed to date.
Cyberpunk 2077 (4K Native RT)
Death Stranding
Sam Porter Bridges has delivered one of PS4's most anticipated games to the PC community and opened a whole new world of possibilities. This was the first game to feature the Decima Engine on PC and unarguably did it the best. Death Stranding may not feature ray tracing effects, but it does showcase that DLSS can be used effectively even when RT isn't around. We tested this one just like we did in our launch coverage with DLSS enabled.
Death Stranding DLSS/FSR (Quality)
Forza Horizon 5
Forza Horizon 5 carries on the open-world racing tradition of the Horizon series. The latest DX12-powered entry is beautifully crafted, amazingly well executed, and a great showcase of DX12 games. We use the benchmark run while having all of the settings set to non-dynamic with an uncapped framerate to gather these results.
Forza Horizon 5
Halo Infinite (DX12 Highest)
Next up, we have the latest entry in the Halo franchise, Halo: Infinite, which uses the brand new Slipspace engine (although there are rumors it will be ditched in the future for Unreal Engine) based on the DX12 API. The game rocks some incredible environments for Master Chief to visit on the Halo ring.
Halo Infinite
Hitman III (DX12 Highest Settings)
Hitman III is the highly acclaimed sequel to the 2016 Hitman & 2018 Hitman II, which was a redesign and reimaging of the game from the ground up. With a focus on stealth gameplay through various missions, the game once again lets you play as Agent 47. The game runs on the IO Interactive Glacier 2 engine which has been updated to deliver amazing visuals and environments on each level while making use of DirectX 12 API.
Hitman III
Shadow of The Tomb Raider
The sequel to Rise of the Tomb Raider, Shadow of The Tomb Raider is visually enhanced with an updated Foundation Engine that delivers realistic facial animations and the most gorgeous environments ever seen in a Tomb Raider Game. The game is a technical marvel and really shows the power of its graphics engine in the latest title.
Shadow of The Tomb Raider
Shadow of The Tomb Raider Raytracing DLSS/FSR (Quality)
Metro Exodus
Metro Exodus continues Artyom's journey through Russia's nuclear wasteland and its surroundings. This time, you are set over the Metro, going through various regions and different environments. The game is one of the premier titles to feature NVIDIA’s RTX technology and does well in showcasing the ray-tracing effects in all corners.
Metro Exodus Extreme Preset
Metro Exodus Raytracing DLSS (Quality)
Resident Evil Village
Resident Evil Village is the latest in the horror franchise that was wonderfully rekindled with RE7 and onto the RE2 Remake. But now the RE Engine is back and better than ever with Ray Traced Reflections and Lighting that makes the world just come to life, unironically. The game was tested in the center of the village itself with all graphical settings maxed out and with raytracing enabled.
Resident Evil Village (Maxed)
Resident Evil Village Raytracing FSR (Quality)
Stray (That Cat Game)
Stray is a 2022 adventure game developed by BlueTwelve Studio and published by Annapurna Interactive. The story follows a stray cat who falls into a walled city populated by robots, machines, and mutant bacteria, and sets out to return to the surface with the help of a drone companion, B-12. The game uses Unreal Engine 4, but DX12 Ray tracing can be enabled by adding the "-dx12" extension to the game.
Stray (Maxed With DXR)
To test DLSS 3, we were sent press builds which allowed us to enable Frame Generation within the following titles. For now, we are focusing on the quality preset at 4K since you are getting over 100 FPS easily, so unless you want more FPS (144+), "Performance" and "Ultra Performance" modes don't make sense.
Cyberpunk 2077 (Quality)
A Plague Tale Requiem (Quality)
Unity Engine (Enemies) Demo 4K
No graphics card review is complete without evaluating its temperatures and thermal load. NVIDIA uses an updated vapor chamber and fan design on the brand-new Founders Edition variant that offers a 10% large fan and fin volume while offering up to 15% higher airflow.
Temperatures
I compiled the power consumption results by testing each card under idle and full stress when the card was running games. Each graphics card manufacturer sets a default TDP for the card which can vary from vendor to vendor depending on the extra clocks or board features they plugin on their custom cards. Default TDP for the GeForce RTX 4090 Founders Edition is rated at 450W and the peak power limit is rated at 600W.
Power Consumption
Efficiency Test Across 10 Games (Non RTX)
Efficiency Test Across 10 Games (RTX)
V-Ray Next
Chaos Group's V-RAY is a commercial plugin used across various 3D Modeling suites from Maya to Cinema 4D allowing for path tracing, photon mapping, irradiance maps, and directly computed global illumination. The plugin is used for video game creation all the way to film and industrial design.
Thankfully they have their own benchmark utility, so you don't have to go into 10+ different suites to find out how well CPUs or graphics cards perform. We used just the GPU portion for our results and according to Chaos Group, there should be a linear performance improvement based on the power and scaling of the GPU.
V-Ray Next Benchmark
OctaneRender
OctaneRender is touted as the world's first and fastest unbiased spectrally correct GPU render engine and is RTX accelerated to bring 2-5x faster render speeds to NVIDIA's raytracing GPU. So we put the OctaneBench to test here with and without RTX Acceleration and then followed up with an actual workload (provided by NVIDIA) to test out how everything responds outside of the benchmark and in a heavily loaded scene.
OctaneBench
Overall Score RTX Enabled
Overall Score RTX Disabled
Box Path Tracing
The road to Ada was sure an exciting one. We got to see various rumors, leaks, and speculation & now we finally have the final product in our hands. There was sure a lot of hype surrounding the RTX 40 series cards and we will see whether the flagship card lives up to the expectations or not.
What Does Ada Bring To The Table?
The three major things that Ada is bringing to the table is a revamped architecture on all three fronts. The CUDA architecture has been given an update, the ray tracing cores have been given an update and the tensor cores have been given an update too. Not only are these three key areas upgraded to a new core design but they also introduce brand-new features. The new ray-tracing cores come with optimized ways to handle BVH processing and this can be seen in both games & applications that use ray-traced render paths. The new tensor cores not only boost existing DLSS performance but with DLSS 3 and frame-generation algorithms, we get to see a quantum leap in performance versus native resolution. And finally, we have raster performance where we got to see a jump anywhere from 50-80% (depending on the title). The average mostly settles around 60% but that itself is a big jump.
Ada is simply a quantum leap in all regards and its only gonna get better from here!
So here's the thing. If you were running an RTX 3090 series graphics card, you will immediately get a 50-80% performance boost without any DLSS or RT applied. With RT applied, the card sees less of a hit in performance versus the Ampere cores. The Ada cores just love RT and in ray tracing-heavy titles, you will notice that the performance gets closer to the 2x claim more often than rasterization. But with that said, ray tracing is still one of the most costly effects to enable in games and taxes the GPU a lot. That's where DLSS 3 comes in. While Ada can definitely handle most ray tracing games at 4K 60 FPS natively, you can get a further boost by leveraging DLSS 3 and that boost takes the perf to the next level. We got to see anywhere from 3-4x gains and in a few cases, our gains were above 4x, which is simply impressive.
Where the GeForce RTX 4090 & RTX 4080 truly shine is in the world of creative professionals who know they can put that massive VRAM pool to use for more than just gaming. Working with 8K Raw Footage in Davinci Resolve was a breeze on the RTX 4080 as well as the RTX 3090, but the RTX 4080 was able to present that experience for savings of a smooth grand. VRAY performance on the new Ada architecture is through the roof. The same goes for OctaneRender so long as you have the VRAM to support it otherwise, you'll find yourself stumbling. Blender really benefits from the architectural improvements and shows quite the speedup over the RTX 3080 when the other cards just don't have the VRAM to keep up.
"But I heard it consumes 900W & requires a new PSU"
Ok, so let's get a few things straight as the power consumption and temperatures discussion going around the Ada Lovelace GPUs has been baseless. So I would like to make sure this is as loud & clear as possible!
No It Doesn't Consumes Anywhere Close To 500W In Gaming And If Are Already Running A PSU That Is Rated At 750W Or Above, You Are Good To Upgrade!
The NVIDIA GeForce RTX 4080 Founders Edition has a TBP of 320W. The only applications that made it push that many watts were either synthetic benchmarks of power bugs such as Furmark or similar stress tests. I was playing Forza Horizon 5 at 4K with everything maxed out and got over 100 FPS while the GPU consumed 220WW of power. The graphics card offered a +20-30% higher performance than the RTX 3090 Ti which consumed around 400W of power in the same game. That's not all; the graphics card also never broke past the 60C in this particular test, while the RTX 3090 Ti was running around 68C.
The GeForce RTX 4080 is a very efficient card. No game was able to push the card past 300 Watts no matter how much hard I tried. With every setting cranked up to the max including ray tracing & running the game natively, the card just stood within a 300W power budget while delivering better performance than an RTX 3090 Ti. That's a major leap in efficiency. We even managed to undervolt our card to a fixed 225W and the card still offered 90% of its overall performance which means it offers 20% better performance than an RTX 3090 Ti at half the power budget.
NVIDIA's GeForce RTX 4080 runs cool, draws much less power than the previous gen and offers up to 40% better performance natively versus the previous-gen flagship. It is one hell of a card!
And to answer the question of whether you really need one of those fancy ATX 3.0 PSUs? Well, you really don't. As I said, if you have a good 850W+ PSU, that's more than enough but if you are planning to build an entirely new PSU, then investing in the latest ATX 3.0 standard will be a good choice but not a required one.
I would suggest that anyone who plans on getting an RTX 4080 makes sure they have a fast CPU. The RTX 4080 demand a lot of processing power to be coupled with and even the Core i9-12900K can become a bit of a bottleneck in some cases. Overclocking the 12900K is the way to go & it's even better if you plan on getting a Ryzen 7000 or Intel 13th Gen PC.
Is It Worth The Price?
The RTX 3080 was priced at $699 US and the RTX 4080 costs $1199 US. That's a $500 US increase or a 71.5% increase. Even compared to the 3080 12 GB, this is a 35% increase and we aren't even taking the custom models into the equation yet which are priced at a $100-$200 US premium. This is a hard pill to swallow and to be honest, this card should've been called the 4080 Ti but that title is reserved for a future graphics card.
The NVIDIA GeForce RTX 4080 is a great graphics card which is hurt by its own pricing.
Compared to the RTX 3090 Ti which can be found for around $1000 US (New), paying $200 US more for 20-40% better performance, more features and reduced power consumption sounds like a very good deal. It's an all-around package that comes with the latest RT and DLSS enhancement and the support of NVIDIA's extensive GeForce feature suite which is one big reason alone that one would like to stick with the brand regardless of the value or performance offered by the competition.
With that said, there's a compelling graphics card lineup on the horizon by competitor AMD. The new Radeon RX 7000 series graphics cards are already been revealed with prices $200-$300 US less than the RTX 4080 and what seems to be higher memory and possibly even performance. We have to wait to confirm the latter but it is one reason why we would recommend you wait a few more weeks before grabbing the RTX 4080 cards.
NVIDIA GeForce GPU Segment/Tier Prices
| Graphics Segment | 2023-2024 | 2022-2023 | 2021-2022 | 2020-2021 | 2019-2020 | 2018-2019 | 2017-2018 | 2016-2017 | 2014-2016 |
|---|---|---|---|---|---|---|---|---|---|
| Titan Tier | GeForce RTX 4090 | GeForce RTX 4090 | GeForce RTX 3090 Ti GeForce RTX 3090 | GeForce RTX 3090 | Titan RTX (Turing) | Titan V (Volta) | Titan Xp (Pascal) | Titan X (Pascal) | Titan X (Maxwell) |
| Price | $1599 US | $1599 US | $1999 US $1499 US | $1499 US | $2499 US | $2999 US | $1199 US | $1199 US | $999 US |
| Ultra Enthusiast Tier | GeForce RTX 4080 SUPER | GeForce RTX 4080 | GeForce RTX 3080 Ti | GeForce RTX 3080 Ti | GeForce RTX 2080 Ti | GeForce RTX 2080 Ti | GeForce GTX 1080 Ti | GeForce GTX 980 Ti | GeForce GTX 980 Ti |
| Price | $999 US | $1199 US | $1199 US | $1199 US | $999 US | $999 US | $699 US | $649 US | $649 US |
| Enthusiast Tier | GeForce RTX 4070 Ti SUPER | GeForce RTX 4070 Ti | GeForce RTX 3080 12 GB | GeForce RTX 3080 10 GB | GeForce RTX 2080 SUPER | GeForce RTX 2080 | GeForce GTX 1080 | GeForce GTX 1080 | GeForce GTX 980 |
| Price | $799 US | $799 US | $799 US | $699 US | $699 US | $699 US | $549 US | $549 US | $549 US |
| High-End Tier | GeForce RTX 4070 SUPER GeForce RTX 4070 | GeForce RTX 4070 GeForce RTX 4060 Ti 16 GB | GeForce RTX 3070 Ti GeForce RTX 3070 | GeForce RTX 3070 Ti GeForce RTX 3070 | GeForce RTX 2070 SUPER | GeForce RTX 2070 | GeForce GTX 1070 | GeForce GTX 1070 | GeForce GTX 970 |
| Price | $599 $549 | $599 US $499 US | $599 $499 | $599 $499 | $499 US | $499 US | $379 US | $379 US | $329 US |
| Mainstream Tier | GeForce RTX 4060 Ti GeForce RTX 4060 | GeForce RTX 4060 Ti GeForce RTX 4060 | GeForce RTX 3060 Ti GeForce RTX 3060 12 GB | GeForce RTX 3060 Ti GeForce RTX 3060 12 GB | GeForce RTX 2060 SUPER GeForce RTX 2060 GeForce GTX 1660 Ti GeForce GTX 1660 SUPER GeForce GTX 1660 | GeForce GTX 1060 | GeForce GTX 1060 | GeForce GTX 1060 | GeForce GTX 960 |
| Price | $449 $299 | $399 US $299 US | $399 US $329 US | $399 US $329 US | $399 US $349 US $279 US $229 US $219 US | $249 US | $249 US | $249 US | $199 US |
| Entry Tier | RTX 3050 8 GB RTX 3050 6 GB | RTX 3050 | RTX 3050 | GTX 1650 SUPER GTX 1650 | GTX 1650 SUPER GTX 1650 | GTX 1050 Ti GTX 1050 | GTX 1050 Ti GTX 1050 | GTX 950 | GTX 750 Ti GTX 750 |
| Price | $229 $179 | $249 US | $249 US | $159 US $149 US | $159 US $149 US | $139 US $109 US | $139 US $109 US | $149 US | $149 US $119 US |
Conclusion
NVIDIA's GeForce RTX 4080 graphics cards deliver an absolutely massive upgrade over the RTX 3080 and RTX 3080 Ti while taking GPU efficiency to new heights. Coupled with features such as superior ray tracing performance and DLSS 3, the graphics card could've been a definitive upgrade over its predecessor but unfortunately, with everything so great about this card, it is plagued by one of the worst price bumps we have seen to date. All we can hope is that NVIDIA revises its price strategy with the upcoming mainstream cards.
Contents
Follow Wccftech on Google to get more of our news coverage in your feeds.