NVIDIA GeForce RTX 20 Series Review Ft. RTX 2080 Ti & RTX 2080 Founders Edition Graphics Cards – Turing Ray Traces The Gaming Industry

Hassan Mujtaba & Keith May

•

Sep 19, 2018 at 09:10am EDT

NVIDIA Turing GPU - Turing Advanced Shading Techniques

Keeping their tradition alive of launching a new graphics architecture every two years, this year, NVIDIA announced the Turing GPU. Primarily aimed at the consumer sector which includes both Quadro and GeForce segments, the Turing GPU is a big departure from traditional GPU designs.

The Turing GPU architecture has a lot to be talked about in this review, but so does the new RTX lineup. One of the key and most disruptive features being talked about regarding Turing is that it will support real-time ray tracing, which has long been considered the holy grail of computer graphics. RTX means a lot to NVIDIA which is why they have decided to change the name of their most historic consumer brands. Following is what the new consumer lineup will be known as from this day onwards:

NVIDIA GeForce GTX ---> NVIDIA GeForce RTX
NVIDIA Quadro ---> NVIDIA Quadro RTX

With Turing, NVIDIA is targeting several markets, but the primary driver for mass adoption of the new RTX features would be gamers. The Turing core is more than just new features, it is based on a refined process node, provides much better efficiency than its predecessor, has newly dedicated hardware-based accelerators and comes with a new streaming multi-processor core which utilizes both traditional and Tensor cores for AI to help provide better visuals without the cost of losing a whole lot of performance in graphics intensive gaming titles.

It took NVIDIA 10 years worth of time and money to turn this dream into a reality and with the first RTX products soon going to be in consumer's hands, we dive into the latest GeForce graphics cards to find out not only how they perform, but also how worthy are the newly talked about features for traditional gamers.

Today, we will be taking a look at the NVIDIA GeForce RTX 2080 Ti and GeForce RTX 2080 Founders Edition graphics cards. Both cards were provided by NVIDIA for this review and we will be taking a look at their technology, design and performance metrics in detail.

NVIDIA Turing GPU - A Major Technical Leap For Gamers and Content Creators

Turing isn't just any graphics core, it is the graphics core that will be the foundation of future GPUs. The reason is that Turing does so many things which will be important to gamers and developers in the coming years and GPU makers will eventually have to adopt some of the new tricks that the Turing GPU does to keep up with the demand of providing faster and more efficient GPU performance by applications.

The Turing GPU does many traditional things which we would expect from a GPU, but at the same time, also breaks the barrier when it comes to untraditional GPU operations. Just to sum up some features:

New Streaming Multiprocessor (SM)
New Turing Tensor Cores
New Real-Time Ray Tracing Acceleration
New Shading Enhancements
New Deep Learning Features For Graphics & Inference
New GDDR6 High-Performance Memory Subsystem
New 2nd Generation NVLINK Interconnect
New USB Type-C and VirtualLink Connectors

The technologies mentioned above are some of the main building blocks of the Turing GPU, but there's more within the graphics core itself which we will talk about in detail. Before moving forward, it should be pointed out that NVIDIA's Turing GPUs and NVIDIA's Volta GPUs have a lot in common, but while Volta is primarily focused at the High-Performance Computing sector, the Turing GPU is designed to be accommodated in the consumer segment. There are also many ways in which Turing has been refined over Volta to offer better gaming performance and higher efficiency.

Let's take a trip down the journey to Turing. In 2016, NVIDIA announced their Pascal GPUs which would soon be featured in their top to bottom GeForce 10 series lineup. After the launch of Maxwell, NVIDIA gained a lot of experience in the efficiency department which they put a focus on since their Kepler GPUs.

Now, with an enhanced FinFET process available, NVIDIA is taking the efficiency lead beyond where it was previously possible, which is completely unrivaled by the competition. With Volta, NVIDIA focused on the AI and HPC market, but most of the features that Volta supported aren’t necessarily needed in the gaming department. Take for instance the double precision floating point execution units. With Pascal, NVIDIA diversified their consumer and HPC GPUs and this time, they are going with a more aggressive approach, completely classifying the consumer GPU in a category of its own. This is where Turing comes in, a GPU designed solely for the consumer segment.

Starting with the most significant part of the Turing GPU architecture, the Turing SM, we are seeing an entirely new graphics core. The Turing SM is made up of a combination of INT32, FP32, and the new Tensor cores.

Coming to the new execution units or cores, Turing has both INT32 and FP32 units which can execute concurrently. This new architectural design allows Turing to execute floating point and non-floating point operations in parallel which allows for up to 36% higher throughput in standard floating point operations.

The Turing SM is partitioned into four processing blocks, each with 16 FP32 Cores, 16 INT32 Cores, two Tensor Cores, one warp scheduler, and one dispatch unit. This adds to 64 FP32 Cores, 64 INT 32 Cores, 8 Tensor, 4 Wrap Schedulers and 4 Dispatch Units on a single Turing SM. Each block also includes a new L0 instruction cache and a 64 KB register file.

The four processing blocks share a combined 96 KB L1 data cache/shared memory. Traditional graphics workloads partition the 96 KB L1/shared memory as 64 KB of dedicated graphics shader RAM and 32 KB for texture cache and register file spill area. Compute workloads can divide the 96 KB into 32 KB shared memory and 64 KB L1 cache, or 64 KB shared memory and 32 KB L1 cache.

The entire SM works in harmony by using different blocks to deliver high performance and better texture caching, enabling for up to 50% better CUDA core performance when compared to the previous generation.

Many of these Turing SMs combine to form the Turing GPU. Each TPC inside the Turing GPU houses 2 Turing SMs which are linked to the raster engine. There are a total of 6 TPCs or 12 Turing SM that are arranged inside the GPC or Graphics Processing Cluster. The top configured TU102 GPU comes with 6 GPCs that are connected to 6 MB of L2 cache, ROPs, TMUs, memory controllers and NVLINK HighSpeed I/O hub. All of this combines to form the massive Turing GPU. Following are some perf figures for the top Turing graphics cards.

NVIDIA GeForce RTX 2080 TI

14.2 TFLOPS of peak single precision (FP32) performance
28.5 TFLOPS of peak half-precision (FP16) performance
14.2 TIPS1 concurrent with FP, through independent integer execution units
113.8 Tensor TFLOPS
10 Giga Rays/sec
78 Tera RTX-OPS

NVIDIA Quadro RTX 8000

16.3 TFLOPS of peak single precision (FP32) performance
32.6 TFLOPS of peak half-precision (FP16) performance
16.3 TIPS1 concurrent with FP, through independent integer execution units
130.5 Tensor TFLOPS
10 Giga Rays/sec
84 Tera RTX-OPS

In terms of shading performance which is the direct result of the enhanced core design and GPU architecture revamp, the Turing GPU offers an average uplift of 50% better performance per core compared to Pascal GPUs. In VR games, the shading performance would be a good 2x ahead than what Pascal achieved while many modern gaming titles show a ~50% lead over Pascal with Turing’s enhanced core design.

It should be pointed out that these are just per core performance gains at the same clock speeds without adding the benefits of other technologies that Turing comes with. That would further increase the performance in a wide variety of gaming applications, since we have already seen the gaming performance of a GeForce RTX 2080 to be 50% faster than the GTX 1080 on average and twice as fast with the new DLSS technology.

The other significant part of the Turing GPU and the most talked about feature of the whole Turing family is the support of Tensor Cores. Now Tensor cores have been available since Volta, but not on consumer cards, let alone gaming products. With Turing, Tensor cores add INT8 and INT4 precision in addition to FP16 which is still fully supported. NVIDIA has been at the helm of the deep learning revolution by supporting it since their Kepler generation of graphics cards. Today, NVIDIA has some of the most powerful AI graphics accelerators and a software stack that is widely adopted by this fast-growing industry.

There's a whole software stack that leverages from Tensor cores and that is known as the NVIDIA NGX. These software-based technologies will help enhance graphics fidelity with features such as Deep Learning Super Sampling (DLSS), AI InPainting, AI Super Rez, and AI Slow-Mo.

As mentioned earlier, there are 8 Tensor Cores per SM block and 16 in a single TPC. The flagship TU102 GPU contains 576 Tensor Cores. A single SM can perform a total of 512 FP16 operations, 1024 FP operations, 2048 INT8 operations and 4096 INT4 operations per clock cycle. This level of performance is utilized in both deep learning training and inferencing operations. The recent announcement of the Tesla T4 at GTC Japan 2018 shows that Turing GPU has use cases in the Tesla market too.

RT Cores, RTX and Real-Time Ray Tracing Dissected

Next up, we have the RT Cores which are what will power Real Time Raytracing. NVIDIA isn't going to distance themselves from traditional rasterization-based rendering, but instead following a hybrid rendering model. The reason being that while GeForce RTX cards are miles ahead in ray tracing performance compared to previous generation cards, they still lack the horsepower to fully ray-trace an entire screen so developers would have to use a very small amount of rays per pixel available on the screen which is around 1 at the lower and 10 at the highest end.

There's one RT core per SM and all of them combined accelerate Bounding Volume Hierarchy (BVH) traversal and ray/triangle intersection testing (ray casting) functions. RT Cores work together with advanced denoising filtering, a highly-efficient BVH acceleration structure developed by NVIDIA Research, and RTX compatible APIs to achieve real-time ray tracing on a single Turing GPU.

RT Cores traverse the BVH autonomously, and by accelerating traversal and ray/triangle intersection tests, they offload the SM, allowing it to handle another vertex, pixel, and compute shading work. Functions such as BVH building and refitting are handled by the driver, and ray generation and shading is managed by the application through new types of shaders.

To better understand the function of RT Cores, and what exactly they accelerate, we should first explain how ray tracing is performed on GPUs or CPUs without a dedicated hardware ray tracing engine. Essentially, the process of BVH traversal would need to be performed by shader operations and take thousands of instruction slots per ray cast to test against bounding box intersections in the BVH until finally hitting a triangle and the color at the point of intersection contributes to final pixel color (or if no triangle is hit, background color may be used to shade a pixel).

Ray tracing without hardware acceleration requires thousands of software instruction slots per ray to test successively smaller bounding boxes in the BVH structure until possibly hitting a triangle. It’s a computationally intensive process making it impossible to do on GPUs in real-time without hardware-based ray tracing acceleration.

The RT Cores in Turing can process all the BVH traversal and ray-triangle intersection testing, saving the SM from spending the thousands of instruction slots per ray, which could be an enormous amount of instructions for an entire scene. The RT Core includes two specialized units. The first unit does bounding box tests, and the second unit does ray-triangle intersection tests. The SM only has to launch a ray probe, and the RT core does the BVH traversal and ray-triangle tests, and return a hit or no hit to the SM. The SM is largely freed up to do other graphics or Compute work.

Turing ray tracing performance with RT Cores is significantly faster than ray tracing in Pascal GPUs. Turing can deliver far more Giga Rays/Sec than Pascal on different workloads, as shown in Figure 19. Pascal is spending approximately 1.1 Giga Rays/Sec, or 10 TFLOPS / Giga Ray to do ray tracing in software, whereas Turing can do 10+ Giga Rays/Sec using RT Cores, and run ray tracing 10 times faster.

Turing is a very powerful core of its generation and there's nothing like it. High-performance GPUs of this caliber require to be fed by lots of bandwidth. NVIDIA's Volta GPUs are fed by the fastest memory standard in the industry aka HBM2 but while they make sense from an HPC standpoint, they are nowhere near the price to be featured on consumer level products. And GDDR5 has already exceeded its maximum potential with G5X. So this is where GDDR6 enters the industry.

While GDDR6 follows an evolutionary path over GDDR5 and GDDR5X memory, there are still some significant changes in the underlying architecture to boost memory bandwidth while saving power. This makes the VRAM a viable option for next-generation consumer graphics cards such as NVIDIA's upcoming line of the GeForce products. Furthermore, unlike GDDR5X which was only supported and produced by Micron, GDDR6 has the backing of all three players which includes Samsung, SK Hynix, and Micron. NVIDIA is continuing their partnership with Micron and featuring their memory on the GeForce RTX cards but if they were ever to run in a shortage, there won't be an issue as NVIDIA can select from other manufacturers too.

For those who like to know what difference is between GDDR5 and GDDR6, we know from the official specifications published by JEDEC, that both memory standards are not a whole lot different from each other but they aren’t the same thing either. The GDDR6 solution is built upon the DNA of GDDR5X and has been updated to deliver twice the data rate and denser die capacities.

While the new memory technology would be very similar to GDDR5X, there are a few differences of which the major ones include:

The introduction of an FBGA180 ball package with increased pitch
A dual channel architecture

There are a lot of design changes that went in developing GDDR6 to achieve the faster transfer speeds, higher bandwidth and in a package that consumers just around the same power or even lower. Samsung states that GDDR6 has 35% lower power input than GDDR5 DRAM.

Coming to the specifications in detail, the Samsung 16 Gb GDDR6 memory die will be built on the 10nm process node which Samsung is calling as the most advanced memory node to date. It will double the density of their GDDR5 solution which was composed of a 20nm 8 Gb die. According to Samsung, their solution will be operating at up to 18 Gbps against a previous standard speed of 16 Gbps and that is a big deal here. Each die will be able to deliver a data transfer rate of 72 Gbps and hold a capacity of 2 GB VRAM. The solution will be able to do all of this with 35% lower power input at just 1.35V compared to 1.55V

This means that a solution based on a 384-bit interface and surrounded by 12 DRAM dies could feature up to 24 GB of VRAM while a 256-bit solution can house up to 16 GB of VRAM. That’s twice the VRAM capacity as current generation cards. While VRAM is one thing, the maximum bandwidth output on a 384-bit card can reach a blistering fast 672 GB/s while the 256-bit solution can reach a stunning 448 GB/s transfer rate on existing 14 Gbps dies which are in full production.

GPU Memory Technology Updates

Graphics Card Name	Memory Technology	Memory Speed	Memory Bus	Memory Bandwidth	Release
AMD Radeon R9 Fury X	HBM1	1.0 Gbps	4096-bit	512 GB/s	2015
NVIDIA GTX 1080	GDDR5X	10.0 Gbps	256-bit	320 GB/s	2016
NVIDIA Tesla P100	HBM2	1.4 Gbps	4096-bit	720 GB/s	2016
NVIDIA Titan Xp	GDDR5X	11.4 Gbps	384-bit	547 GB/s	2017
AMD RX Vega 64	HBM2	1.9 Gbps	2048-bit	483 GB/s	2017
NVIDIA Titan V	HBM2	1.7 Gbps	3072-bit	652 GB/s	2017
NVIDIA Tesla V100	HBM2	1.7 Gbps	4096-bit	901 GB/s	2017
NVIDIA RTX 2080 Ti	GDDR6	14.0 Gbps	384-bit	672 GB/s	2018
AMD Instinct MI100	HBM2	2.4 Gbps	4096-bit	1229 GB/s	2020
NVIDIA A100 80 GB	HBM2e	3.2 Gbps	5120-bit	2039 GB/s	2020
NVIDIA RTX 3090	GDDR6X	19.5 Gbps	384-bit	936.2 GB/s	2020
AMD Instinct MI200	HBM2e	3.2 Gbps	8192-bit	3200 GB/s	2021
NVIDIA RTX 3090 Ti	GDDR6X	21.0 Gbps	384-bit	1008 GB/s	2022
NVIDIA H100 80 GB	HBM3/E	2.6 Gbps	5120-bit	1681 GB/s	2022

NVIDIA Turing GPUs With Better Memory Compression – Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

One of the key improvements of Pascal over Maxwell was the faster memory compression algorithms which delivered very high bandwidth by using various compression and caching techniques.

With Turing, we are looking at the third generation of memory compression architecture which is said to effectively deliver up to 50% boost in effective bandwidth when compared to Pascal GPUs. We know that the Pascal GeForce GTX 1080 Ti memory bandwidth was boosted to 1.2 TB/s over the raw 484.4 GB/s bandwidth when using these algorithms and with Turing, NVIDIA is saying that we should expect 50% more effective bandwidth with Memory Compression 3.0.

Since Turing GPU already have higher raw bandwidth compared to Pascal GPUs (RTX 2080 Ti with 616 GB/s), we can expect the effective bandwidth using the new algorithm to reach past 1.5 TB/s which is very good considering it would help games deliver even better performance on higher resolutions which the graphics cards are aiming at.

With each new generation of graphics cards, NVIDIA delivers a new range of display technologies. This generation is no different and we see some significant updates to not only the display engine but also the graphics interconnect. With the adoption of faster GDDR6 memory which provides higher bandwidth, faster compression, and more cache, Gaming applications can now run at higher resolutions, supporting more details on the display.

The Turing Display Engine supports two new display technologies, DisplayPort 1.4a and VirtualLink. DisplayPort 1.4a allows for upto 8K resolutions with 60Hz refresh rates and includes VESA's display stream compression 1.2 technology with visually lossless compression. You can run up to two 8K displays at 60 Hz using two cables, one for each display. In addition to that, Turing also supports HDR processing natively with tone mapping added to the HDR pipeline.

Turing GPUs also ship with an enhanced NVENC encoder unit that adds support for H.265 (HEVC) 8K encode at 30 fps. The new NVENC encoder provides up to 25% bitrate savings for HEVC and up to 15% bitrate savings for H.264.

Turing’s new NVDEC decoder has also been updated to support decoding of HEVC YUV444 10/12b HDR at 30 fps, H.264 8K, and VP9 10/12b HDR.

Turing improves encoding quality compared to prior generation Pascal GPUs and compared to software encoders. Figure 11 shows that on common Twitch and YouTube streaming settings, Turing’s video encoder exceeds the quality of the x264 software-based encoder using the fast encode settings, with dramatically lower CPU utilization. 4K streaming is too heavy a workload for encoding on typical CPU setups, but Turing’s encoder makes 4K streaming possible.

VirtualLink and USB Type-C - First on GeForce RTX

NVIDIA is also moving to make Virtual Reality less of a hassle for many users. Their solution is the new VirtualLink connector that uses a USB Type-C interface to seamlessly connect Virtual Reality headsets to your PC.

VirtualLink is a new open industry standard that includes leading silicon, software, and headset manufacturers and is led by NVIDIA, Oculus, Valve, Microsoft, and AMD. VirtualLink has been developed to meet the connectivity requirements of current and next-generation VR headsets. VirtualLink employs a new alternate mode of USB-C, designed to deliver the power, display, and data required to power VR headsets through a single USB-C connector.

VirtualLink simultaneously supports four lanes of High Bit Rate 3 (HBR3) DisplayPort along with the SuperSpeed USB 3 link to the headset for motion tracking. In comparison, USB-C only supports four lanes of HBR3 DisplayPort OR two lanes of HBR3 DisplayPort + two lanes SuperSpeed USB 3.
In addition to easing the setup hassles present in today’s VR headsets, VirtualLink will bring VR to more devices.

A single connector solution brings VR to small form factor devices that can accommodate a single, small footprint USB-C connector (such as a thin and light notebook) rather than today’s VR infrastructure which requires a PC that can accommodate multiple connectors.

Say Hello To NVLINK, The Permanent SLI Replacement For Next-Gen NVIDIA Graphics Cards - Supports 2-Way GPU Configurations

NVIDIA has said farewell to their SLI (Scale Link Interface) interconnect for consumer graphics cards. They will now be using the NVLINK interconnect which has already been featured on their HPC GPUs. The reason is that SLI was simply not enough to feed higher bandwidth to Turing GPUs.

A single x8 NVLINK channel provides 25 GB/s peak bandwidth. There are two x8 links on the TU102 GPU and a single x8 link on the Turing TU104 GPU. The TU102 GPU features 50 GB/s of bandwidth in parallel and 100 GB/s bandwidth bi-directionally. Using NVLINK on high-end cards would be beneficial in high-resolution gaming but there's a reason NVIDIA still restricts users from doing 3 and 4 way SLI.

Multi-GPU still isn't optimized so you won't see much benefits unless you are running the highest end graphics cards. That's another reason why the RTX 2070 is deprived of NVLINK connectors. The NVLINK connectors cost $79 US each and is sold separately. Currently, only NVIDIA is selling them as the AIB cards don't include any such connectors but that may change once the standard is adopted widely.

When the GeForce GTX 1080 Ti launched, NVIDIA showed how it was their fastest Ti yet, eclipsing the non-Ti GTX 1080 with up to 35% better performance. The RTX 2080 Ti seems to be continuing this trend as being the fastest Ti model to date. The other important thing to consider is graphics performance with DLSS or Deep Learning Imaging enabled. While RTX 2080 and RTX 2080 Ti would be capable of delivering 60 FPS at 4K in many modern titles, games with DLSS technology would further see major performance boosts by not only enhancing the image quality but offering much better anti-aliasing.

Here we see the true potential of the tensor cores and their massive multi-TFLOPs of Compute power being put to good use. But while all this sounds really great, we have to consider a few things which are very important for the average consumer. First of all, there is no mention of the settings used for testing the games and what gaming titles were used. They could be titles that generally favor NVIDIA GPUs and we know for a fact that titles with DLSS enabled are optimized for GeForce RTX series cards.

It's also quite astonishing to see the GeForce GTX 1080 Ti being placed under the 60 FPS bar on the 4K resolution. Sure it's not capable of delivering an average 60 FPS in all modern titles at 4K but it is a very capable gaming graphics card and the performance king of its generation which made 4K 60 FPS a reality. Even some factory overclocked cards can reach the 4K 60 FPS mark with ease so that should be taken into consideration.

The list of new titles includes titles like Darksiders III and OVERKILL’s The Walking Dead, both due in November, as well as others that are already out like Hellblade: Senua’s Sacrifice or SCUM and Fear the Wolves, both of which are available on Steam Early Access. The total count of games featuring NVIDIA DLSS support is now twenty-five.

Newly Announced DLSS Titles

Darksiders III from Gunfire Games / THQ Nordic
Deliver Us The Moon: Fortuna from KeokeN Interactive
Fear the Wolves from Vostok Games / Focus Home Interactive
Hellblade: Senua’s Sacrifice from Ninja Theory
KINETIK from Hero Machine Studios
Outpost Zero from Symmetric Games / tinyBuild Games
Overkill’s The Walking Dead from Overkill Software / Starbreeze Studios
SCUM from Gamepires / Devolver Digital
Stormdivers from Housemarque

Other Titles Implementing DLSS

Ark: Survival Evolved from Studio Wildcard
Atomic Heart from Mundfish
Dauntless from Phoenix Labs
Final Fantasy XV from Square Enix
Fractured Lands from Unbroken Studios
Hitman 2 from IO Interactive/Warner Bros.
Islands of Nyne from Define Human Studios
Justice from NetEase
JX3 from Kingsoft
Mechwarrior 5: Mercenaries from Piranha Games
PlayerUnknown’s Battlegrounds from PUBG Corp.
Remnant: From the Ashes from Arc Games
Serious Sam 4: Planet Badass from Croteam/Devolver Digital
Shadow of the Tomb Raider from Square Enix/Eidos-Montréal/Crystal Dynamics/Nixxes
The Forge Arena from Freezing Raccoon Studios
We Happy Few from Compulsion Games / Gearbox

Standing for Neural Graphics Acceleration, it’s a new deep-learning based technology stack part of the RTX platform. Here’s a brief description from NVIDIA:

NGX utilizes deep neural networks (DNNs) and a set of Neural Services to perform AI-based functions that accelerate and enhance graphics, rendering, and other client-side applications. NGX employs the Turing Tensor Cores for deep learning-based operations and accelerates delivery of NVIDIA deep learning research directly to the end-user. Note that NGX does not work on GPU architectures before Turing.

The NVIDIA NGX is ‘tightly’ integrated with the drivers and hardware. There’s an NGX API (described as thin and easy for developers to use) which provides access to multiple AI-based features, pre-trained by NVIDIA.

All the NVIDIA NGX features will be managed via GeForce Experience if you own a GeForce GPU, or via Quadro Experience (now available in tech preview) if you have a Quadro GPU installed. The software will look for a Turing GPU and, upon finding it in the system, proceeds to download the NVIDIA NGX Core package as well as the deep neural network models available for the installed games and applications.

These DNN models interface with DirectX, Vulkan and CUDA 10, the latest version of NVIDIA’s SDK. Furthermore, the DNN models and services are accelerated with Turing’s Tensor Cores and take advantage of high-performance inference optimizer TensorRT, which delivers low latency and high throughput.

NVIDIA DLSS is the specific DNN model devised to solve the inherent issues, like blurring and transparency, with TAA (Temporal AntiAliasing). Here, NVIDIA leveraged the demonstrated image processing capabilities of a deep learning network. DLSS can deliver either much higher quality than TAA at a certain set of input samples, or much faster performance at a lower input sample count, all while inferring a visual result that’s of similar quality to TAA while using basically half the shading work.

For example, at 4K resolution, DLSS provided two times faster performance than TAA in Epic’s Unreal Engine 4 Infiltrator demo. Of course, the pre-requisite is a training process where the DNN learns how to produce the desired result thanks to a ‘large number of super high-quality examples’.

To train the network, we collect thousands of “ground truth” reference images rendered with the gold standard method for perfect image quality, 64x supersampling (64xSS). 64x supersampling means that instead of shading each pixel once, we shade at 64 different offsets within the pixel, and then combine the outputs, producing a resulting image with ideal detail and anti-aliasing quality. We also capture matching raw input images rendered normally.

Next, we start training the DLSS network to match the 64xSS output frames, by going through each input, asking DLSS to produce an output, measuring the difference between its output and the 64xSS target, and adjusting the weights in the network based on the differences, through a process called backpropagation.

After many iterations, DLSS learns on its own to produce results that closely approximate the quality of 64xSS, while also learning to avoid the problems with blurring, disocclusion, and transparency that affect classical approaches like TAA.

There’s also a DLSS 2X mode which is entirely focused on high-quality rather than performance. DLSS 2x provides ‘almost indistinguishable’ quality to a 64x supersampled image, which would be impossible to render in real time for obvious reasons.

As we can see in the image below, DLSS 2X delivers far superior image clarity when compared to TAA. That said, we suspect the ‘performance mode’ will be the main use of NVIDIA DLSS for the time being.

While Turing comes with a variety of performance-oriented shading improvements like Mesh Shading, Variable Rate Shading, and Texture-space Shading, so far DLSS is the one that’s seeing widespread adoption with 25 games already confirmed to adopt it and developers like Phoenix Labs talking positively of its benefits.

It’s indeed promising to say the least. By cross-referencing NVIDIA’s own benchmarks, the GeForce RTX 2080 with DLSS enabled should jump to 57.6 FPS in Shadow of the Tomb Raider, almost catching up with the base 59 FPS registered by the RTX 2080 Ti. Which, in turn, could soar to well over 70FPS at 4K resolution with DLSS enabled, and Shadow of the Tomb Raider isn’t even the best game to demonstrate the technology according to NVIDIA’s benchmarks (other titles like Final Fantasy XV and ARK: Survival Evolved had much bigger gains with DLSS).

NVIDIA GeForce Experience and ANSEL RTX Highlights

NVIDIA is also incorporating new shading models which would significantly help the games process vertex, tesselation, and geometry shading.

Mesh Shading — new shader model for vertex, tesselation, geometry shading (more objects per scene)
Variable Rate Shading (VRS) — developer control over shading rates (to limit shading where it does not provide visual benefit)
Texture-Space Sharing — Storing shading results in memory (no need to duplicate sharing work for the processes)
Multi-View Rendering (MVR) — Extends Pascal’s Single Pass Stereo to multi-views in a single pass

MESH SHADING

Mesh Shading introduces two new shader stages, Task Shaders, and Mesh Shaders, that support this same functionality, but with much more flexibility. The mesh shader stage produces triangles for the rasterizer, but internally, instead of using a single-thread program model, it uses a cooperative thread model similar to compute shaders.

Ahead of the mesh shader in the pipeline is the task shader. The task shader operates similarly to the hull shader stage of tessellation, in that it is able to dynamically generate work. However, like the mesh shader, it uses a cooperative thread model and instead of having to take a patch as input and tessellation decisions as output, its input and output are user-defined.

VARIABLE RATE SHADING

Turing introduces a new and dramatically more flexible capability for controlling shading rate called Variable Rate Shading (VRS). With VRS, shading rate can now be adjusted dynamically at an extremely fine level—every 16-pixel x 16-pixel region of the screen can now have a different shading rate.

This fine-level of control enables developers to deploy new algorithms that were not previously possible for optimizing shading rate and increasing performance. The developer has up to seven options to choose from for each 16x16 pixel region, including having one shading result be used to color four pixels (2 x 2), or 16 pixels (4 x 4), or non-square footprints like 1 x 2 or 2 x 4.

Overall, with Turing’s VRS technology, a scene can be shaded with a mixture of rates varying between once per visibility sample (super-sampling) and once per sixteen visibility samples. The developer can specify shading rate spatially (using a texture) and using a per-primitive shading rate attribute. As a result, a single triangle can be shaded using multiple rates, providing the developer with fine-grained control.

CONTENT ADAPTIVE SHADING

In Content Adaptive Shading, shading rate is simply lowered by considering factors like spatial and temporal (across frames) color coherence. The desired shading rate for different parts of the next frame to be rendered are computed in a post-processing step at the end of the current frame. If the amount of detail in a particular region was relatively low (sky or a flat wall etc.), then the shading rate can be locally lowered in the next frame.

The output of the post-process analysis is a texture specifying a shading rate per 16 x 16 tile, and this texture is used to drive shading rate in the next frame. A developer can implement content-based shading rate reduction without modifying their existing pipeline, and with only small changes to their shaders.

MOTION ADAPTIVE SCALING

The second application of Variable Rate Shading exploits objects motion. Our eyes are designed to track moving objects linearly so that we can see their details even when in motion. However, objects on LCD screens do not move smoothly or continuously. Rather, they jump from one location to the next with each 60 Hz frame update.

From the perspective of our eye, which is trying to smoothly track the object, it looks like it is wiggling back and forth on the retina as its location moves ahead and behind of the path the eye is tracking. The net result is that we cannot see the full detail of the object, instead, we see a somewhat lower resolution/blurred version.

The main implication of this phenomenon is that when objects are moving rapidly in the scene, it is wasteful to shade them at full resolution. It would be more efficient to shade at a reduced sampling rate, while still at a high enough rate to be visually equivalent. The savings from optimized shading can be used to deliver a higher frame rate so that the scene is easier to follow.

VRS gives the tools to do this optimization. In the simplest approach, devs can use the motion vectors from Temporal AA to understand motion. The direction and magnitude of motion can be used to directly select an appropriate shading rate per tile. A related approach would be to use VRS to take advantage of blur effects in applications, where both motion blur and depth of field (DOF) are sometimes explicitly rendered. An application can directly compute the degree and direction of blur of individual objects and use the extent of blur to set a per-triangle shading rate.

Note that the methods of these two examples (Content Adaptive Shading and Motion Adaptive Shading) can also be used in combination, with the final shading rate for a region/triangle computed as an application-specified function of the two rates.

MULTI-VIEW RENDERING

Multi-View Rendering MVR) allows developers to efficiently draw a scene from multiple viewpoints or even draw multiple instances of a character in varying poses, all in a single pass. Turing hardware supports up to four views per pass, and up to 32 views are supported at the API level. By fetching and shading geometry only once, Turing optimally processes triangles and their associated vertex attributes while rendering multiple versions. When accessed via the D3D12 View Instancing API, the developer simply uses the variable SV_ViewID to index different transformation matrices, reference different blend weights, or control any shader behavior they like, that varies depending on which view they are processing.

With multiple active views, each triangle can have a mix of view-dependent attributes and view- independent attributes (values that are shared across all views). A simple example of a view-dependent attribute is a reflection direction because it depends on the eye’s position, vertex position, and a normal vector. To improve efficiency, the NVIDIA compiler analyzes the input shader and produce a compiled output that executes view independent code once, with the result shared across all output views, while view dependent attributes are necessarily computed once per output view.

Turing’s MVR is an expansion of the Simultaneous Multi-Projection (SMP) functionality introduced in the Pascal architecture. SMP was designed specifically to accelerate stereo and surround rendering cases. With SMP the developer can specify two views, where view dependent attributes are limited to the vertex X coordinate and viewport(s) used for rasterization. Each view can then be multicast to a set of up to 16 pre-configured projections (or viewports) to support use cases such as Lens Matched Shading. Turing removes the limitations on allowed view dependent attributes and increases the number of views supported while continuing to support up to 16 projections per view.

NVIDIA is for the first time not only launching the **80 and **70 cards along with the flagship **80 Ti model but they are also launching graphics cards with three different GPUs. While the GPUs are similar in design, the configurations are very different and one thing we can tell is that the configs leave a lot of room for NVIDIA to expand upon in the future if they want to.

What I mean to say is that the RTX 2080 Ti isn’t based on the full TU102 GPU, the RTX 2080 is also not based on the full TU104 GPU while the RTX 2070 is the only card that utilizes the full config of the GPU its based upon, the Turing TU106.

One more thing, these GPUs are really huge in terms of die size compared to the Pascal GPU, while using the 12nm process. The reason being the added INT32 execution units and Tensor cores which weren’t available on any previous consumer based GeForce graphics cards. Hence, the TU106 GPU which succeeds the GP106 GPU is over twice as large as its predecessor (445mm2 versus 200mm2).

NVIDIA Turing TU102 GPU

The TU102 is made up of 6 graphics processing clusters with 6 SM units on each cluster. That makes up 72 SM units for a total of 4608 cores in an 18.6 billion transistor package measuring 754mm2.

NVIDIA TU102 GPU

GPU Name	GM200	GP102	GV100	TU102
Flagship GPU Design	Titan X	Titan XP	Titan V	Quadro RTX 8000
Architecture	Maxwell	Pascal	Volta	Turing
Process Node	28nm	16nm FF	12nm FFN	12nm FFN
Transistors	8 Billion	12 Billion	21.1 Billion	18.6 Billion
Die Size	601mm2	471mm2	815mm2	754mm2
GPCs	6	6	6	6
TPCs	N/A	30	42	36
SMs	24	30	84	72
ROPs	96	96	96	96
TMUs	192	240	320	288
L2 Cache	3 MB	3 MB	4.5 MB	6 MB
CUDA Cores / SM	128	128	64	64
CUDA Cores / GPU	3072	3840	5120	4608
Tensor Cores / SM	N/A	N/A	8	8
Tensor Cores / GPU	N/A	N/A	640	576
RT Cores	N/A	N/A	N/A	72
GPU Base Clock	1000 MHz	1405 MHz	1200 MHz	1455 MHz
GPU Boost Clock	1089 MHz	1582 MHz	1455 MHz	1770 MHz
RTX-OPS	N/A	N/A	TBD	84
Rays Cast	N/A	1.1	TBD	10
Peak FP32 TFLOPs	6.6	12.1	13.8	16.3
Peak FP16 TFLOPs	N/A	N/A	27.6	32.6
Peak FP16 TOPs	N/A	N/A	110	130.5
Peak INT8 TOPs	N/A	N/A	N/A	261.0
Peak INT4 TOPs	N/A	N/A	N/A	522.0
VRAM	12 GB GDDR5	12 GB GDDR5X	12 GB HBM2	48 GB GDDR6
Memory Bus	384-bit	384-bit	3072-bit	384-bit
Memory Clock	7 Gbps	11.4 Gbps	1.7 Gbps	14 Gbps
Memory Bandwidth	336.6 GB/s	547.6 GB/s	652.8 GB/s	672 GB/s
Register File Size / SM	256 KB	256 KB	256 KB	256 KB
Register File / GPU	6 MB	7 MB	14 MB	18 MB
Texture Fill Rate	209.1 GT/s	380 GT/s	465.6 GT/s	510 GT/s
TDP	250W	250W	250W	280W

NVIDIA Turing TU104 GPU

The TU104 is made up of 6 graphics processing clusters with 4 SM units on each cluster. That makes up 48 SM units for a total of 3072 cores in a 13.6 billion transistor package measuring 545mm2.

NVIDIA Turing TU106 GPU

The TU106 is made up of 3 Graphics processing clusters with 6 SM units on each cluster. That makes up 36 SM units for a total of 2304 Cores in a 10.6 billion transistor package measuring 445mm2.

Here’s another thing, the GP106 was used in the GTX 1060 which is more of a mainstream graphics card. However, while the RTX 2070 rocks a TU106 GPU which may make it look like a mainstream GPU with a much higher price tag, it does have overall better specifications compared to the GP104 based GTX 1070 with higher Cores, better memory, and more features. It also has around twice as many cores as the GTX 1060 so calling it a mainstream graphics card won’t be a wise choice.

The GeForce RTX 2080 Ti is the flagship graphics card of 2018 in NVIDIA’s inventory. Featuring the latest Turing GPU architecture designed by NVIDIA, the GeForce RTX 2080 Ti will allow gamers to play new VR experiences, games with real-time raytracing and 4K HDR content at improved FPS compared to current generation graphics cards.

Coming to the GeForce RTX 2080 Ti, the graphics card is powered by the Turing TU102 GPU. The TU102 GPU is the successor to NVIDIA's GP102 GPU and sticks to the same principles which made the GTX 1080 Ti and Titan XP the best enthusiast cards of 2017, which is to offer gamers the best in class performance that no other competitor GPU can match. And the 2080 Ti is just going to crush everything next to it in the performance benchmarks.

GeForce RTX — New Family of Gaming GPUs
The new GeForce RTX 2080 Ti, 2080 and 2070 GPUs are packed with features never before seen in a gaming GPU, including:

New RT Cores to enable real-time ray tracing of objects and environments with physically accurate shadows, reflections, refractions, and global illumination.
Turing Tensor Cores to perform lightning-fast deep neural network processing.
New NGX neural graphics framework integrates AI into the overall graphics pipeline, enabling AI algorithms to perform amazing image enhancement and generation.
New Turing shader architecture with Variable Rate Shading allows shaders to focus processing power on areas of rich detail, boosting overall performance.
New memory system featuring ultra-fast GDDR6 with over 600GB/s of memory bandwidth for high-speed, high-resolution gaming.
NVIDIA NVLink, a high-speed interconnect that provides higher bandwidth (up to 100 GB/s) and improved scalability for multi-GPU configurations (SLI).
Hardware support for USB Type-C and VirtualLink, a new open industry standard being developed to meet the power, display and bandwidth demands of next-generation VR headsets through a single USB-C connector.
New and enhanced technologies to improve the performance of VR applications, including Variable Rate Shading, Multi-View Rendering, and VRWorks Audio.

NVIDIA GeForce RTX/GTX "Turing" Family:

Graphics Card Name	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1650 D6	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1660	NVIDIA GeForce GTX 1660 SUPER	NVIDIA GeForce GTX 1660 Ti	NVIDIA GeForce RTX 2060	NVIDIA GeForce RTX 2070	NVIDIA GeForce RTX 2080	NVIDIA GeForce RTX 2080 Ti
GPU Architecture	Turing GPU (TU117)	Turing GPU (TU117)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU106)	Turing GPU (TU106)	Turing GPU (TU104)	Turing GPU (TU102)
Process	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN
Die Size	200mm2	200mm2	284mm2	284mm2	284mm2	284mm2	445mm2	445mm2	545mm2	754mm2
Transistors	4.7 Billion	4.7 Billion	6.6 Billion	6.6 Billion	6.6 Billion	6.6 Billion	10.6 Billion	10.6 Billion	13.6 Billion	18.6 Billion
CUDA Cores	896 Cores	896 Cores	1280 Cores	1408 Cores	1408 Cores	1536 Cores	1920 Cores	2304 Cores	2944 Cores	4352 Cores
TMUs/ROPs	56/32	56/32	80/32	88/48	88/48	96/48	120/48	144/64	192/64	288/96
GigaRays	N/A	N/A	N/A	N/A	N/A	N/A	5 Giga Rays/s	6 Giga Rays/s	8 Giga Rays/s	10 Giga Rays/s
Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	6 MB L2 Cache
Base Clock	1485 MHz	1410 MHz	1530 MHz	1530 MHz	1530 MHz	1500 MHz	1365 MHz	1410 MHz	1515 MHz	1350 MHz
Boost Clock	1665 MHz	1590 MHz	1725 MHz	1785 MHz	1785 MHz	1770 MHz	1680 MHz	1620 MHz 1710 MHz OC	1710 MHz 1800 MHz OC	1545 MHz 1635 MHz OC
Compute	3.0 TFLOPs	3.0 TFLOPs	4.4 TFLOPs	5.0 TFLOPs	5.0 TFLOPs	5.5 TFLOPs	6.5 TFLOPs	7.5 TFLOPs	10.1 TFLOPs	13.4 TFLOPs
Memory	Up To 4 GB GDDR5	Up To 4 GB GDDR6	Up To 4 GB GDDR6	Up To 6 GB GDDR5	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 8 GB GDDR6	Up To 8 GB GDDR6	Up To 11 GB GDDR6
Memory Speed	8.00 Gbps	12.00 Gbps	12.00 Gbps	8.00 Gbps	14.00 Gbps	12.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps
Memory Interface	128-bit	128-bit	128-bit	192-bit	192-bit	192-bit	192-bit	256-bit	256-bit	352-bit
Memory Bandwidth	128 GB/s	192 GB/s	192 GB/s	192 GB/s	336 GB/s	288 GB/s	336 GB/s	448 GB/s	448 GB/s	616 GB/s
Power Connectors	N/A	N/A	6 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8+8 Pin	8+8 Pin
TDP	75W	75W	100W	120W	125W	120W	160W	185W (Founders) 175W (Reference)	225W (Founders) 215W (Reference)	260W (Founders) 250W (Reference)
Starting Price	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$499 US	$699 US	$999 US
Price (Founders Edition)	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$599 US	$799 US	$1,199 US
Launch	April 2019	April 2020	November 2019	March 2019	October 2019	February 2019	January 2019	October 2018	September 2018	September 2018

NVIDIA GeForce RTX 2080 Ti ($1199 USD) - The Flagship GeForce Turing Graphics Card – 260W TDP and 11 GB GDDR6 Memory

The NVIDIA GeForce RTX 2080 Ti features the TU102 GPU (TU102-300-A1) core which comprises of 4352 CUDA cores. NVIDIA's 12nm FinFET architecture allows higher core count while retaining faster clock speeds which we have already seen on Pascal cards. The chip houses 18.6 Billion transistors which are a huge jump compared to the 12 Billion transistors on the Pascal GP102 GPU. The card delivers much higher performance due to enhanced core design that adds incremental IPC gains.

The actual clock speeds are maintained at 1350 MHz base and 1545 MHz boost (1635 MHz OC on Founders Edition). The chip features 11 GB of GDDR6 (next-gen) memory featured across a 352-bit bus and clocked at 14 GB/s. This leads to a total bandwidth of 616 GB/s.

The NVIDIA GeForce RTX 2080 Ti Founders Edition features a TDP of 260W (250W for the custom models) which goes in line with previous flagship cards. Coupled with a very smooth power delivery system to avoid leakage, the chip is one of the most efficient GPU architecture ever designed for gamers. The display outputs for the card include 3 DisplayPort 1.4 (4K @ 120 Hz), 1 HDMI 2.0b (4K @ 60 Hz) and USB Type-C connector for the Virtual link which means that it is capable to support all next-gen displays with new standards. Power is fed through a dual 8 pin connector configuration.

NVIDIA GeForce RTX 20 Series Design, Next-Gen NVTTM With Dual Fan Cooler, Beefy Aluminum Fin Based Heatsink, 13+3 Phase PCB, NVLINK For Dual-Way Multi-GPU Functionality

When it comes to the cooler design, NVIDIA is taking a major departure from their blower styled cooler from previous reference designs and going for a strong dual fan cooling system which is said to deliver better cooling performance. The cooler comprises of dual rotatory fans that push cool air towards a large heatsink block that is made up of several aluminum fins and interconnects via heat pipe technology. The cooler has a high-performance vapor chamber underneath the hood which uses a copper base to effectively dissipate heat from the GPU and surroundings such as the VRAM.

All of this is packed beneath an elegant looking cooler which doesn't use the same design pattern as the GeForce 10 series founders edition cards. Those went with a more polygonal texture but this is more of a plain design featuring an aluminum shroud with a matte black finish in the center. The RTX 2080 / 2080 Ti logo is seen on the side and has LEDs to glow when the card is operational. The shroud engulfs the entire card, leading from the top and even the back which acts as a backplate and looks very neat.

The factory overclocked GeForce RTX 2080 Ti Founders Edition graphics card features a next-gen 13-phase power supply for maximum overclocking and dual-axial 13-blade fans coupled with a new vapor chamber for ultra-cool and quiet performance. via NVIDIA

In addition to the cooler, the RTX 20 series cards rock a single NVLINK connector, capable of offering dual way multi-GPU functionality. The RTX 2080 cards will operate at (x8/x8) mode while RTX 2080 Ti cards would operate in (x16/x16) mode. The GeForce RTX 2070 won't offer any NVLINK connectors.

NVIDIA GeForce RTX 2080 Ti Official Photo Gallery:

NVIDIA GeForce RTX 2080 Ti Official PCB Shot:

The NVIDIA GeForce RTX 2080 is the next chapter in high-performance gaming graphics cards. Featuring the latest Turing GPU architecture designed by NVIDIA, the GeForce RTX 2080 will allow gamers to play new VR experiences, games with real-time raytracing and 4K HDR content at improved FPS compared to the GeForce GTX 1080 with an expected performance jump of around 50%.

Coming to Turing, the GeForce RTX 2080 is powered by the Turing TU104 GPU. The TU104 GPU is the successor to NVIDIA's GP104 GPU and sticks to the same principles which made the GTX 1080 and GTX 1070 great which is to offer gamers the best performance at the highest efficiency rate (perf/watt) and deliver products that are highly competitive in pricing and performance.

GeForce RTX — New Family of Gaming GPUs
The new GeForce RTX 2080 Ti, 2080 and 2070 GPUs are packed with features never before seen in a gaming GPU, including:

New RT Cores to enable real-time ray tracing of objects and environments with physically accurate shadows, reflections, refractions, and global illumination.
Turing Tensor Cores to perform lightning-fast deep neural network processing.
New NGX neural graphics framework integrates AI into the overall graphics pipeline, enabling AI algorithms to perform amazing image enhancement and generation.
New Turing shader architecture with Variable Rate Shading allows shaders to focus processing power on areas of rich detail, boosting overall performance.
New memory system featuring ultra-fast GDDR6 with over 600GB/s of memory bandwidth for high-speed, high-resolution gaming.
NVIDIA NVLink, a high-speed interconnect that provides higher bandwidth (up to 100 GB/s) and improved scalability for multi-GPU configurations (SLI).
Hardware support for USB Type-C and VirtualLink, a new open industry standard being developed to meet the power, display and bandwidth demands of next-generation VR headsets through a single USB-C connector.
New and enhanced technologies to improve the performance of VR applications, including Variable Rate Shading, Multi-View Rendering, and VRWorks Audio.

NVIDIA GeForce RTX/GTX "Turing" Family:

Graphics Card Name	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1650 D6	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1660	NVIDIA GeForce GTX 1660 SUPER	NVIDIA GeForce GTX 1660 Ti	NVIDIA GeForce RTX 2060	NVIDIA GeForce RTX 2070	NVIDIA GeForce RTX 2080	NVIDIA GeForce RTX 2080 Ti
GPU Architecture	Turing GPU (TU117)	Turing GPU (TU117)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU106)	Turing GPU (TU106)	Turing GPU (TU104)	Turing GPU (TU102)
Process	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN
Die Size	200mm2	200mm2	284mm2	284mm2	284mm2	284mm2	445mm2	445mm2	545mm2	754mm2
Transistors	4.7 Billion	4.7 Billion	6.6 Billion	6.6 Billion	6.6 Billion	6.6 Billion	10.6 Billion	10.6 Billion	13.6 Billion	18.6 Billion
CUDA Cores	896 Cores	896 Cores	1280 Cores	1408 Cores	1408 Cores	1536 Cores	1920 Cores	2304 Cores	2944 Cores	4352 Cores
TMUs/ROPs	56/32	56/32	80/32	88/48	88/48	96/48	120/48	144/64	192/64	288/96
GigaRays	N/A	N/A	N/A	N/A	N/A	N/A	5 Giga Rays/s	6 Giga Rays/s	8 Giga Rays/s	10 Giga Rays/s
Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	6 MB L2 Cache
Base Clock	1485 MHz	1410 MHz	1530 MHz	1530 MHz	1530 MHz	1500 MHz	1365 MHz	1410 MHz	1515 MHz	1350 MHz
Boost Clock	1665 MHz	1590 MHz	1725 MHz	1785 MHz	1785 MHz	1770 MHz	1680 MHz	1620 MHz 1710 MHz OC	1710 MHz 1800 MHz OC	1545 MHz 1635 MHz OC
Compute	3.0 TFLOPs	3.0 TFLOPs	4.4 TFLOPs	5.0 TFLOPs	5.0 TFLOPs	5.5 TFLOPs	6.5 TFLOPs	7.5 TFLOPs	10.1 TFLOPs	13.4 TFLOPs
Memory	Up To 4 GB GDDR5	Up To 4 GB GDDR6	Up To 4 GB GDDR6	Up To 6 GB GDDR5	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 8 GB GDDR6	Up To 8 GB GDDR6	Up To 11 GB GDDR6
Memory Speed	8.00 Gbps	12.00 Gbps	12.00 Gbps	8.00 Gbps	14.00 Gbps	12.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps
Memory Interface	128-bit	128-bit	128-bit	192-bit	192-bit	192-bit	192-bit	256-bit	256-bit	352-bit
Memory Bandwidth	128 GB/s	192 GB/s	192 GB/s	192 GB/s	336 GB/s	288 GB/s	336 GB/s	448 GB/s	448 GB/s	616 GB/s
Power Connectors	N/A	N/A	6 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8+8 Pin	8+8 Pin
TDP	75W	75W	100W	120W	125W	120W	160W	185W (Founders) 175W (Reference)	225W (Founders) 215W (Reference)	260W (Founders) 250W (Reference)
Starting Price	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$499 US	$699 US	$999 US
Price (Founders Edition)	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$599 US	$799 US	$1,199 US
Launch	April 2019	April 2020	November 2019	March 2019	October 2019	February 2019	January 2019	October 2018	September 2018	September 2018

NVIDIA GeForce RTX 2080 ($799 USD) - The High-End GeForce Turing Graphics Card - 225W TDP and 8 GB GDDR6 Memory

The NVIDIA GeForce RTX 2080 features the TU104 GPU (TU104-400-A1) core which comprises of 2944 CUDA cores. NVIDIA's 12nm FinFET architecture allows higher core count while retaining faster clock speeds which we have already seen on Pascal cards. The chip houses 13.6 Billion transistors which are a nice jump compared to the 7.2 Billion transistors on the Pascal GP104 GPU. The card delivers much higher performance due to enhanced core design that adds incremental IPC gains.

The actual clock speeds are maintained at 1515 MHz base and 1710 MHz boost (1800 MHz OC on Founders Edition). The chip features 8 GB of GDDR6 (next-gen) memory featured across a 256-bit bus and clocked at 14 GB/s. This leads to a total bandwidth of 448 GB/s.

The NVIDIA GeForce RTX 2080 features just 225W TDP on Founders Edition and 215W TDP on non-Founders Edition cards. Coupled with a very smooth power delivery system to avoid leakage, the chip is one of the most efficient GPU architecture ever designed for gamers. The display outputs for the card include 3 Display Port 1.4 (4K @ 120 Hz), 1 HDMI 2.0b (4K @ 60 Hz) and USB Type-C connector for the Virtual link which means that it is capable to support all next-gen displays with new standards. Power is fed through an 8 and 6 pin connector configuration.

NVIDIA GeForce RTX 20 Series Design, Next-Gen NVTTM With Dual Fan Cooler, Beefy Aluminum Fin Based Heatsink, Heavy Overclock Ready 8 Phase PCB, NVLINK For Multi-GPU Functionality

When it comes to the cooler design, NVIDIA is taking a major departure from their blower styled cooler from previous reference designs and going for a strong dual fan cooling system which is said to deliver better cooling performance. The cooler comprises of dual rotatory fans that push cool air towards a large heatsink block that is made up of several aluminum fins and interconnects via heat pipe technology. The cooler has a high-performance vapor chamber underneath the hood which uses a copper base to effectively dissipate heat from the GPU and surroundings such as the VRAM.

All of this is packed beneath an elegant looking cooler which doesn't use the same design pattern as the GeForce 10 series founders edition cards. Those went with a more polygonal texture but this is more of a plain design featuring an aluminum shroud with a matte black finish in the center. The RTX 2080 / 2080 Ti logo is seen on the side and has LEDs to glow when the card is operational. The shroud engulfs the entire card, leading from the top and even the back which acts as a backplate and looks very neat.

The factory overclocked GeForce RTX 2080 Founders Edition graphics card features a next-gen 8-phase power supply for maximum overclocking and dual-axial 13-blade fans coupled with a new vapor chamber for ultra-cool and quiet performance. via NVIDIA

In addition to the cooler, the RTX 20 series cards rock a single NVLINK connector, capable of offering dual way multi-GPU functionality. The RTX 2080 cards will operate at (x8/x8) mode while RTX 2080 Ti cards would operate in (x16/x16) mode.

NVIDIA GeForce RTX 2080 Official Photo Gallery:

NVIDIA GeForce RTX 2080 Official PCB Shot:

The NVIDIA GeForce RTX 2070 is designed to be the essential Turing GeForce graphics card for all gamers. It is the most affordable RTX card in NVIDIA's inventory so we can see it becoming a very popular solution amongst the gaming masses.

Coming to Turing, the GeForce RTX 2070 is powered by the Turing TU106 GPU. The TU106 GPU is the successor to NVIDIA's GP106 GPU and sticks to the same principles which made the GTX 1060 6GB and GTX 1060 3GB great which is to offer gamers the best performance at the highest efficiency rate (perf/watt) and deliver products that are highly competitive in pricing and performance.

GeForce RTX — New Family of Gaming GPUs
The new GeForce RTX 2080 Ti, 2080 and 2070 GPUs are packed with features never before seen in a gaming GPU, including:

New RT Cores to enable real-time ray tracing of objects and environments with physically accurate shadows, reflections, refractions, and global illumination.
Turing Tensor Cores to perform lightning-fast deep neural network processing.
New NGX neural graphics framework integrates AI into the overall graphics pipeline, enabling AI algorithms to perform amazing image enhancement and generation.
New Turing shader architecture with Variable Rate Shading allows shaders to focus processing power on areas of rich detail, boosting overall performance.
New memory system featuring ultra-fast GDDR6 with over 600GB/s of memory bandwidth for high-speed, high-resolution gaming.
NVIDIA NVLink, a high-speed interconnect that provides higher bandwidth (up to 100 GB/s) and improved scalability for multi-GPU configurations (SLI).
Hardware support for USB Type-C and VirtualLink, a new open industry standard being developed to meet the power, display and bandwidth demands of next-generation VR headsets through a single USB-C connector.
New and enhanced technologies to improve the performance of VR applications, including Variable Rate Shading, Multi-View Rendering, and VRWorks Audio.

NVIDIA GeForce RTX/GTX "Turing" Family:

Graphics Card Name	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1650 D6	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1660	NVIDIA GeForce GTX 1660 SUPER	NVIDIA GeForce GTX 1660 Ti	NVIDIA GeForce RTX 2060	NVIDIA GeForce RTX 2070	NVIDIA GeForce RTX 2080	NVIDIA GeForce RTX 2080 Ti
GPU Architecture	Turing GPU (TU117)	Turing GPU (TU117)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU106)	Turing GPU (TU106)	Turing GPU (TU104)	Turing GPU (TU102)
Process	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN
Die Size	200mm2	200mm2	284mm2	284mm2	284mm2	284mm2	445mm2	445mm2	545mm2	754mm2
Transistors	4.7 Billion	4.7 Billion	6.6 Billion	6.6 Billion	6.6 Billion	6.6 Billion	10.6 Billion	10.6 Billion	13.6 Billion	18.6 Billion
CUDA Cores	896 Cores	896 Cores	1280 Cores	1408 Cores	1408 Cores	1536 Cores	1920 Cores	2304 Cores	2944 Cores	4352 Cores
TMUs/ROPs	56/32	56/32	80/32	88/48	88/48	96/48	120/48	144/64	192/64	288/96
GigaRays	N/A	N/A	N/A	N/A	N/A	N/A	5 Giga Rays/s	6 Giga Rays/s	8 Giga Rays/s	10 Giga Rays/s
Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	6 MB L2 Cache
Base Clock	1485 MHz	1410 MHz	1530 MHz	1530 MHz	1530 MHz	1500 MHz	1365 MHz	1410 MHz	1515 MHz	1350 MHz
Boost Clock	1665 MHz	1590 MHz	1725 MHz	1785 MHz	1785 MHz	1770 MHz	1680 MHz	1620 MHz 1710 MHz OC	1710 MHz 1800 MHz OC	1545 MHz 1635 MHz OC
Compute	3.0 TFLOPs	3.0 TFLOPs	4.4 TFLOPs	5.0 TFLOPs	5.0 TFLOPs	5.5 TFLOPs	6.5 TFLOPs	7.5 TFLOPs	10.1 TFLOPs	13.4 TFLOPs
Memory	Up To 4 GB GDDR5	Up To 4 GB GDDR6	Up To 4 GB GDDR6	Up To 6 GB GDDR5	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 8 GB GDDR6	Up To 8 GB GDDR6	Up To 11 GB GDDR6
Memory Speed	8.00 Gbps	12.00 Gbps	12.00 Gbps	8.00 Gbps	14.00 Gbps	12.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps
Memory Interface	128-bit	128-bit	128-bit	192-bit	192-bit	192-bit	192-bit	256-bit	256-bit	352-bit
Memory Bandwidth	128 GB/s	192 GB/s	192 GB/s	192 GB/s	336 GB/s	288 GB/s	336 GB/s	448 GB/s	448 GB/s	616 GB/s
Power Connectors	N/A	N/A	6 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8+8 Pin	8+8 Pin
TDP	75W	75W	100W	120W	125W	120W	160W	185W (Founders) 175W (Reference)	225W (Founders) 215W (Reference)	260W (Founders) 250W (Reference)
Starting Price	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$499 US	$699 US	$999 US
Price (Founders Edition)	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$599 US	$799 US	$1,199 US
Launch	April 2019	April 2020	November 2019	March 2019	October 2019	February 2019	January 2019	October 2018	September 2018	September 2018

NVIDIA GeForce RTX 2070 ($599 USD) - The High-End GeForce Turing Graphics Card - 185W TDP and 8 GB GDDR6 Memory

The NVIDIA GeForce RTX 2070 features the TU106 GPU (TU106-400-A1) core which comprises of 2304 CUDA cores. NVIDIA's 12nm FinFET architecture allows higher core count while retaining faster clock speeds which we have already seen on Pascal cards. The chip houses 10.6 Billion transistors which are a nice jump compared to the 7.2 Billion transistors on the Pascal GP104 GPU. The card delivers much higher performance due to enhanced core design that adds incremental IPC gains.

The actual clock speeds are maintained at 1410 MHz base and 1620 MHz (Up to 1710 MHz OC on Founders Edition). The chip features 8 GB of GDDR6 (next-gen) memory featured across a 256-bit bus and clocked at 14 GB/s. This leads to a total bandwidth of 448 GB/s.

The NVIDIA GeForce RTX 2080 features just 185W TDP on the Founders Edition and 175W TDP on the non-Founders Edition graphics cards. Coupled with a very smooth power delivery system to avoid leakage, the chip is one of the most efficient GPU architecture ever designed for gamers. The display outputs for the card include 3 Display Port 1.4 (4K @ 120 Hz), 1 HDMI 2.0b (4K @ 60 Hz) and USB Type-C connector for the Virtual link which means that it is capable to support all next-gen displays with new standards. Power is fed through an 8 and 6 pin connector configuration.

NVIDIA GeForce RTX 20 Series Design, Next-Gen NVTTM With Dual Fan Cooler, Beefy Aluminum Fin Based Heatsink, Heavy Overclock Ready PCB

When it comes to the cooler design, NVIDIA is taking a major departure from their blower styled cooler from previous reference designs and going for a strong dual fan cooling system which is said to deliver better cooling performance. The cooler comprises of dual rotatory fans that push cool air towards a large heatsink block that is made up of several aluminum fins and interconnects via heat pipe technology.

The factory overclocked GeForce RTX 2070 Founders Edition graphics card features a next-gen 6-phase power supply for maximum overclocking and dual-axial 13-blade fans coupled with a new vapor chamber for ultra-cool and quiet performance. via NVIDIA

The cooler has a high-performance vapor chamber underneath the hood which uses a copper base to effectively dissipate heat from the GPU and surroundings such as the VRAM.

All of this is packed beneath an elegant looking cooler which doesn't use the same design pattern as the GeForce 10 series founders edition cards. Those went with a more polygonal texture but this is more of a plain design featuring an aluminum shroud with a matte black finish in the center. The RTX 2080 / 2080 Ti logo is seen on the side and has LEDs to glow when the card is operational. The shroud engulfs the entire card, leading from the top and even the back which acts as a backplate and looks very neat.

NVIDIA GeForce RTX 2070 Official Photo Gallery:

The NVIDIA GeForce RTX 20 series would first be available in reference only models which are also known as Founders Edition. These variants adopt NVIDIA's reference cooler and PCB design which are set as the standard for AIB partners and their non-reference designs. This time, NVIDIA has taken a different approach with the Founders Edition cards.

We know that the Founders Edition cards rock a higher price compared to the non-reference models. This time, however, the RTX 2080 Ti Founders Edition costs $200 US more than the reference MSRP of $999 US while the RTX 2080 and RTX 2070 Founders Edition cost $100 US higher than the non-reference models.

This time, NVIDIA also gives a fresh new design and the best PCB design to date. Gone is the blower fan cooler and now we get dual axial fans. Gone is the aggressive design from GeForce 10 series cards and instead, we get an elegant aluminum shroud which covers the entire frame around the PCB. And one more thing, Founders Edition cards come with a factory overclock so those non-reference cards we talked about a bit earlier, yeah, those would be lower clocked than NVIDIA's reference FE cards.

NVIDIA Founders Edition Graphics Cards - Now With 90 MHz Out of Box Overclock!

All founders edition cards come with a 90 MHz overclock out of the box. This applies across the three GeForce RTX 20 series cards that were announced today, including the RTX 2080 Ti, RTX 2080 and the RTX 2070. All cards are tested by NVIDIA to ensure they run at the stated clock speeds and are backed by a 3-year warranty.

But if you want more performance, then manual overclocking is also available and there's much you can gain by configuring the cards yourself.

To deliver record-breaking performance, the factory-overclocked GeForce RTX 2080 uses 225 Watts of power out of the box and tops out around 280 Watts for enthusiasts chasing the best and most ultimate overclocking performance.

Dual-Axial Fans For Double The Cooling Performance and Quiet Operation? I'll Take it!

When it comes to the cooler design, NVIDIA is taking a major departure from their blower styled cooler from previous reference designs and going for a strong dual-axial fan cooling system which is said to deliver better cooling performance. The cooler comprises of dual fans that push cool air towards a large heatsink block that is made up of several aluminum fins and interconnects via heat pipe technology. NVIDIA is using a 3-phase monitor to limit vibrational noise, offering quieter GPU operation, even in intensive gaming loads.

The cooler has a high-performance vapor chamber underneath the hood which uses a copper base to effectively dissipate heat from the GPU and surroundings such as the VRAM. NVIDIA is stating that the cooler performs 10 degrees (Celcius) better than the previous founder edition cards while emitting 1/5th the noise.

The factory overclocked GeForce RTX 2080 Founders Edition graphics card features a next-gen 8-phase power supply for maximum overclocking and dual-axial 13-blade fans coupled with a new vapor chamber for ultra-cool and quiet performance. via NVIDIA

NVIDIA's GPU Boost 4.0 - User Editable Targets and More Control

Like every major GPU generation, NVIDIA continues to provide tools to users so they can get the most from each GPU. GPU Boost 4 is the fourth iteration, and it adds the ability for users to manually adjust the algorithms that GPU Boost uses to dial in the clock.

The algorithms used with GPU Boost 3.0 were completely inside the driver and were not exposed
to users. However, GPU Boost 4.0 now exposes the algorithms to users so they can manually modify the various curves themselves to increase performance in the GPU. The biggest benefit is in the temp domain where new inflection points have been added.

Where before it was a straight line that dropped directly down to the Base Clock, the clock now holds the Boost Clock where it can be set to run longer at higher temperatures before a second temp target (T2) is reached where it will drop the clocks. This new plateau is where higher performance is gained and it is an area where many apps tend to move around. And if users can reduce heat in the system or add more cooling to the chip, they can take advantage of the lower temps by adjusting the curve to achieve higher levels of performance.

NVIDIA RTX 20 Series Reference PCB - Overclockers Rejoice! You Can Actually Overclock This Bad Boy

NVIDIA states that the GeForce RTX 2080 Founder Edition makes use of 225W of power out of the box and up to 280W when pushed to the max by overclockers. Furthermore, the entire power delivery system has been rebuilt for GeForce RTX Founders Edition graphics cards, starting with the all-new 13-phase iMON DrMOS power supply. Of particular note, there is a new ability to switch off phases, for drastically-reduced power consumption at low workloads, which greatly increases power efficiency.

Similarly, we’ve cranked up the number of capacitors on GeForce RTX graphics cards, enabling them to run at lower voltages by reducing voltage noise, or at high frequencies for improved performance.

Controlling all this is a new dynamic power management system, which adjusts power usage on a sub-millisecond basis, for a less variable current draw, which further improves efficiency and allows us to crank up the amount of power available for overclocking compared to previous-generation Pascal Founders Edition graphics cards. via NVIDIA

NVIDIA states that their new reference PCB allows for more power headroom of around 55W. That is where the 280W figure is coming from. The new electrical components also ensure much cleaner power delivery, allowing for better overclocks without wasting excess of power.

NVIDIA's OC Scanner Utility Helps Push Massive Overclocks in Under 20 Minutes

While Manual overclocking is fully supported, NVIDIA has also provided their latest OC scanner application which reduces the time it takes for configuring your graphics card to hit the highest clock speeds. Traditionally, a user would increase the clock speeds gradually with different voltages applied and that could result in up to an hour worth of time when taking in factor the stability test process.

NVIDIA is streamlining the process by offering a simple one-click overclocking utility that runs on the NV Scanner API and features a sophisticated Test algorithm that is designed to test various workloads during overclocking. It not only reduces the time taken to hit the best and stable clocks but also ensures that the card would run stable in all applications by testing a known set of voltage points that ensures no variability to be caused different games or load.

NVIDIA has also promised that their OC scanner application won't crash or TDR since the utility would be collecting data and errors or windows corruption would be detected far before they become a nuisance for the user. During their presentation, NVIDIA showed the OC scanner pushed the cards to 2130 MHz with 1.06V.

New RTX Brand With New RTX Coolers - Diecast Aluminum Shroud, Dual Anodization For Scratch Resistance and Durability

All of this is packed beneath an elegant looking cooler which doesn’t use the same design pattern as the GeForce 10 series founders edition cards. Those went with a more polygonal texture but this is more of a plain design featuring a die-cast aluminum shroud with a matte black finish in the center.

The RTX 2080 / 2080 Ti logo is seen on the side and has LEDs to glow when the card is operational. The shroud engulfs the entire card, leading from the top and even the back which acts as a backplate and looks very neat.

The edges of the diecast aluminum fan shroud are diamond-cut to create a precision reflective trim, dual anodization offers durability and scratch resistance, and the forged, stamped and machined cover provides a classy rigid-yet-lightweight frame for the open cooler design.

Then, to make the shroud look even better, we utilized Physical Vapor Deposition on the RTX 2080 name label, and LED-lit the GeForce GTX logo, creating a unique look that’s never before seen before now on a graphics card.

You might also notice that the card is fully enveloped by a beautiful shroud, and even the NVLINK connector is protected by a matching removable cover, giving GeForce RTX GPUs a clean appearance. And to round things out, there’s a black anodized bracket, a stamped aluminum backplate with GeForce RTX branding, for a consistent black and silver appearance across the entire card.

via NVIDIA

Next Generation Graphics Cards With Next-Gen Display Connectivity

Since this is a next-generation launch, we were expecting to see new display connections on the cards and that was confirmed during the announcement a few hours ago.

The display outputs for the card include 3 DisplayPort 1.4 (4K @ 120 Hz), 1 HDMI 2.0b (4K @ 60 Hz/ HDCP 2.2 support) and USB Type-C connector for the Virtual link for next-generation VR headsets which means that it is capable to support all next-gen displays with new standards. Power is fed through a dual 8 pin connector configuration.

In this section, we take a glance at both GeForce RTX 20 series graphics in all of their glory. We have already talked about their aesthetics and design in the previous section so this section is solely dedicated to the beauty of these next-generation graphics cards.

NVIDIA GeForce RTX 2080 Graphics Card:

NVIDIA GeForce RTX 2080 Ti Graphics Card:

You can also check out our unboxing video below:

We used the following test system for comparison between the different graphics cards. Latest drivers that were available at the time of testing were used from AMD and NVIDIA on an updated version of Windows 10. All games that were tested were patched to the latest version for better performance optimization for NVIDIA and AMD GPUs.

Wccftech Test Bench (GeForce RTX 20 Founders Edition)

CPU	Intel Core i5-8600K @ 5 GHz
GPU	NVIDIA GeForce RTX 2080 Ti Founders Edition NVIDIA GeForce RTX 2080 Founders Edition MSI GeForce GTX 1080 Ti Gaming X Trio NVIDIA GeForce GTX 1080 Founders Edition AMD Radeon RX Vega 64 LC
Motherboard	MSI Z370 Gaming Plus
Memory	16GB Geil EVO X DDR4 3200
PSU	Cooler Master V1200 Platinum
Drivers	GeForce 411.50 GeForce 399.24 Radeon 18.9.1

All games were tested on 2560×1440 (2K) and 3840x2160 (4K) resolutions.
Image Quality and graphics configurations have been provided in the screenshots below.
The "reference" cards are the stock configs while the "overclock" cards are factory overclocked configs provided to us by various AIB partners.

3DMark Firestrike (Standard/Ultra/Extreme)

3DMark Firestrike is the widely popular video card benchmark test for Windows that is designed to measure your PC’s gaming performance. We tested the cards in the standard, ultra, and extreme presets. You can find the total graphics score of each card listed below.

3DMark Timespy (Standard/Extreme)

3DMark Timespyis the reincarnation of the famous graphics test suite with DirectX 12 API. We tested the cards in the standard and extreme presets. You can find the total graphics score of each card listed below.

Unigine Superposition (1080P Extreme)

Unigine Superposition is an Extreme performance and stability test for PC hardware: video card, power supply, cooling system. It lets you evaluate your rig in stock and overclocking modes with a real-life load! Also includes interactive experience in a beautiful, detailed environment.

Assassins Creed: Origins (Very High Settings)

Assassins Creed Origins is built by the same team that made Assassins Creed IV: Black Flag. They are known for reinventing the design and game philosophy of the Assassins Creed saga and their latest title shows that. Based in Egypt, the open-world action RPG shows its graphics strength in all corners. It uses the AnvilNext 2.0 engine which boosts the draw distance range and delivers a very impressive graphics display.

DOOM (Vulkan API Ultra Settings)

In 2016, Id finally released Doom. My testing wouldn’t be complete without including this title. It’s a hell fest featuring fast-paced FPS action and tons of demons to kill. The latest title is based on both Vulkan and OpenGL APIs that take advantage of the latest multi-core and multi-GPU upgrades.

Deus Ex: Mankind Divided (DX12 High Settings)

Humanity is at war with itself and divided into factions. On one end, we have the pure and on the other, we have the augmented. That is the world where Adam Jensen lives in and this is the world of Deus Ex: Mankind Divided. The game uses the next generation Dawn Engine that was made by IO interactive on the foundation of their Glacier 2 engine. The game features support of DirectX 12 API and is one of the most visually intensive titles that taxes the GPU really hard.

Far Cry 5 (Ultra Settings)

Far Cry 5 is a standalone successor to its predecessor and takes place in Hope County, a fictional region of Montana. The main story revolves around doomsday cult the Project at Eden's Gate and its charismatic leader Joseph Seed. It uses a beefed up Dunia Engine which itself is a modified version of CryEngine from Crytek.

Final Fantasy XV (Highest Settings)

Hitman 2016 (DX12 Highest Settings)

With the latest drivers, NVIDIA has managed to up the performance of their Pascal and Maxwell parts in Hitman (2016). The game has been a major win for AMD graphics cards when switching over from DX11 to DX12 but NVIDIA has recently caught up and their cards now deliver exceptional performance in the title.

Middle Earth: Shadow of War(Very High Settings)

The successor of 2014's epic, Shadow of Mordor, Shadow of War continues the previous game's narrative continuing the story of the ranger Talion and the spirit of the elf lord Celebrimbor, who shares Talion's body, as they forge a new Ring of Power to amass an army to fight against Sauron. The game uses the latest Firebird Engine develope by Monolith Productions and is very intensive even for modern graphics cards.

Monster Hunter: World (High Settings)

Prey (Very High Settings)

Rainbow 6: Siege (Ultra Settings)

Shadow of The Tomb Raider (DX12 Highest Settings)

Strange Brigade (DX12 Ultra Settings)

Ghost Recon: Wildlands (Very High Settings)

Using the new Anvil Next engine that was developed by Ubisoft Montreal, Ghost Recon: Wildlands goes wild and grand with an open-world setting in entire Bolivia. This game is a tactical third person shooter which does seem an awful lot similar to Tom Clancy’s: The Division. The game looks pretty and the wide scale region of Bolivia looks lovely at all times (Day/Night Cycle).

The Witcher 3 (Ultra & HairWorks Enabled)

Witcher 3 is the greatest fantasy RPG of our time, it has a great story, great gameplay mechanics and gorgeous graphics. This is the only game I actually wanted to get a stable FPS at 4K. With Gameworks disabled, I gave all high-end cards the ability to demonstrate their power.

Fast forward to 2017, and I can finally enjoy Witcher 3 in all its glory at over 60 FPS with everything turned to max and even Gameworks features enabled. Isn’t the technology cycle great?

We compiled the power consumption results by testing each card under idle and full stress when the card was running games. Each graphics card manufacturer sets a default TDP for the card which can vary from vendor to vendor depending on the extra clocks or board features they plug in on their custom cards. Default TDP for the RTX 2080 Ti is set at 260W and 225W for the RTX 2080. Do note that the Founders Edition variants are clocked at 90 MHz higher than the reference specs so power consumption would be higher.

The 12nm process which Turing cards are based on is a refinement of the 16nm process from TSMC. Both TU102 and TU104 recieve huge bump in core specifications yet power consumption hasn't gone up by a lot. The RTX 2080 FE system on average was around 40W more power hungry than the GTX 1080 while the RTX 2080 Ti was less power hungry than a custom 1080 Ti graphics card.

No graphics card review is complete without evaluating its temperatures and thermal load. Both Founders Edition cards feature a similar design with a dual axial fan cooler, large aluminum fin stack with vapor chamber and a contact base.

When compared to the previous Founders Edition cards featured on the GeForce 10 series, we can see a good increase in cooling performance provided by the dual axial fan and the larger heatsink design. Since the blower fan is gone now, the acoustic levels have also been lowered tremendously, allowing for cooler and quiet operation which one should expect from a high-cost reference card.

The lower temperatures also mean that there's more headroom for overclocking available and Boost 4.0 would work extremely well on these cards, pushing it beyond the rated boost frequency with the new algorithms.

Overclocking the RTX 2080ti was something we really wanted to have ready in time for the review we posted last week, but limited time meant that the Founders Edition review was relegated to the default factory overclock results. It still feels weird to say that the GPU makers video card is factory overclocked. Either way we wanted to explore the OC Scanner utility implemented into EVGA's Precision X One and how well it works. There was a bit of a bug in the pre-release version of the software that strangely set out memory clock to a -200 even though it shouldn't have, so we opted to wait for the public release and that cured our ails on it.

Achieving the overclock we did was fairly simply on both the Precision X One and manual overclocks. Precision X One simply required going to the Voltage Frequency Curve Scanner, upping the power target and clicking Scan. Once it settled on the +88MHz frequency we hit apply and done. The manual required a bit of work since, unlike the OC Scanner we wanted to up the ante on the memory as well by adding a +500 to it along with the +135MHZ core offset. But did they translate to real world performance?

*Note images are for reference only since they were taken with the RTX 2080 installed.

Clock Speeds

Rather than just listing the core clock speeds we achieved we felt it more prudent to show you a visual representation of what the frequencies looked like over the course of time. The chart below is the frequency achieved over the course of a 5 minute stress test using Unigine Superposition at 4K High settings. While the lines between the stock frequency and the manual overclock are pretty far apart, I have to tip my hat to the OC Scanner function for coming very close.

Thermals and Power Draw

Of course when you push something past it's default specification you can expect thermals and power consumption to grow as well. Turing is no different in this regard. We observed temps breaking the 80c barrier, on the default fan curve mind you since we didn't tweak that. Power consumption for the entire system went up by roughly 60 watts as well.

Deus Ex Mankind Divided

Monster Hunter World

Shadow of the Tomb Raider

Overclocking the RTX 2080ti did indeed bring us up and past the 4K 60FPS barrier on both Shadow of the Tomb Raider and Monster Hunter: World making both just a bit more enjoyable for 4k gaming. Perhaps in the future we'll explore how much settings really impact 4k gaming rather than overclocking, maybe there's something in those really high settings in SotTR that could turn it into a locked 4K 60FPS experience.

Well, there is the current state of Turing. The time spent with the RTX 2080 and RTX 2080 Ti have been rather impressive. The RTX 2080 is ripping through 1440p high fidelity gaming at high frame rates letting gamers take advantage of those 1440p 144Hz panels to the fullest. The RTX 2080 Ti has finally opened up the door to high fidelity and next to no compromise 4k gaming, with the promise of DLSS on the horizon only making that 4k gaming experience even better and really makes those high refresh rate 4k panels a bit more appealing.

The uptick in rasterized gaming performance is welcome, but does come at the expense of higher power consumption across both tiers. Temperatures have never been better on a vendor made card's air-cooled solution. That cooler may cause concern for those worried about heat pouring across their motherboards, but in a properly configured case, I don't imagine it being an issue.

There's a lot going on in Turing and that's going to cost you at the end of the day. The addition of RT cores and Tensor cores isn't a gimmick or a cheap tack-on, but the unfortunate part of both of those is the lack of use today. Ray tracing is coming, but we're still waiting for Windows to update to make DXR available, and then the wait on games to implement it. Until DXR and the game are ready there's quite a bit of unused real estate on these dies.

But for me, I'm most excited for DLSS. After seeing it for myself the benefit it delivered on The Infiltrator Demo with the RTX 2080 and RTX 2080 Ti gets me giddy. Knowing that this feature is much easier to inject into the title and seeing the list of support grow quickly this will be the feature that can really widen the gap between the 10 series and the 20 series.

The entire RTX package is put together well. The Founders Edition is really premium this time around, offering a proper high-end cooling solution that at least makes the additional price tag make sense. I can't wait to see and experience what the future holds for RTX. There's an old saying that goes "Price is what you pay, but value is what you get". There's value in the RTX lineup, but at the asking price you're going to have to ask if it's what you're looking for.

The still unknown ray tracing performance is still a concern among many and for those who are worried might be best to wait, but those wanting to experience the newest in truly next-generation gaming experience NVIDIA has put its cards on the table and they're ready to play.

Contents

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.