Hardware

NVIDIA GeForce RTX 3080 10 GB “Ampere” Graphics Card Review

Hassan Mujtaba & Keith May • Sep 16, 2020 at 08:59am EDT

Product Info

NVIDIA GeForce RTX 3080

October, 2020

Type

Graphics Card

Price

$699.99 US

NVIDIA Ampere GPU - 2nd Gen RT and 3rd Gen Tensor Cores Deep Dive

NVIDIA has also introduced its 3rd Generation Tensor core architecture and 2nd Generation RT cores on Ampere GPUs. Now Tensor cores have been available since Volta and consumers got a taste of it with the Turing GPUs. One of the key areas where Tensor Cores are put to use for AAA games is DLSS. There's a whole software stack that leverages from Tensor cores and that is known as the NVIDIA NGX. These software-based technologies will help enhance graphics fidelity with features such as Deep Learning Super Sampling (DLSS), AI InPainting, AI Super Rez, RTX Voice, and AI Slow-Mo.

nvidia-geforce-rtx-30-series-deep-dive_rtx-3080_rtx-3090_rtx-3070_ampere-ga102_ampere-ga104_gpu_graphics-cards_12

nvidia-geforce-rtx-30-series-deep-dive_rtx-3080_rtx-3090_rtx-3070_ampere-ga102_ampere-ga104_gpu_graphics-cards_2

While its initial debut was a bit flawed, DLSS in its 2nd iteration (DLSS 2.0) has done wonders to not only improve gaming performance but also image quality. In titles such as Death Stranding and Control, games are shown to offer higher visual fidelity than at native resolution while running at much higher framerates. With Ampere, we can expect an even higher boost in terms of DLSS 2.0 (and DLSS Next-Gen) performance as the deep-learning model continues working its magic in DLSS supported titles. NVIDIA will also be adding 8K DLSS support to its Ampere GPU lineup which would be great to test out with the 24 GB RTX 3090 graphics card.

With Ampere, Tensor cores add INT8 and INT4 precision in addition to FP16 which is still fully supported. NVIDIA has been at the helm of the deep learning revolution by supporting it since its Kepler generation of graphics cards. Today, NVIDIA has some of the most powerful AI graphics accelerators and a software stack that is widely adopted by this fast-growing industry.

For its 3rd Gen Tensor cores, NVIDIA is using the same sparsity architecture that they've used on the Ampere HPC line of GPUs. While Ampere features 4 Tensor cores per SM compared to Turing's 8 tensor cores per SM, they are not only based on the new 3rd Generation design but also get an increased count with the larger SM array. The Ampere GPU can execute 128 FP16 FMA operations per tensor core utilizing its entire INT16 cores and with sparsity, it can do up to 256. The total FP16 FMA operations per SM are increased to 512 and 1024 with sparsity. That's a 2x increase over the Turing GPU in terms of inference performance with the updated Tensor design.

2nd Gen RT Cores, RTX, and Real-Time Ray Tracing Dissected

Next up, we have the RT Cores which are what will power Real-Time Raytracing. NVIDIA isn't going to distance themselves from traditional rasterization-based rendering, but instead following a hybrid rendering model. The new 2nd Generation RT cores offer increased performance and offer double the ray/triangle intersection testing rate over Turing RT cores.

There's one RT core per SM and all of them combined accelerate Bounding Volume Hierarchy (BVH) traversal and ray/triangle intersection testing (ray casting) functions. RT Cores work together with advanced denoising filtering, a highly-efficient BVH acceleration structure developed by NVIDIA Research, and RTX compatible APIs to achieve real-time ray tracing on a single Turing GPU.

RT Cores traverse the BVH autonomously, and by accelerating traversal and ray/triangle intersection tests, they offload the SM, allowing it to handle another vertex, pixel, and compute shading work. Functions such as BVH building and refitting are handled by the driver, and ray generation and shading are managed by the application through new types of shaders.

nvidia-geforce-rtx-30-series-deep-dive_rtx-3080_rtx-3090_rtx-3070_ampere-ga102_ampere-ga104_gpu_graphics-cards_5

nvidia-geforce-rtx-30-series-deep-dive_rtx-3080_rtx-3090_rtx-3070_ampere-ga102_ampere-ga104_gpu_graphics-cards_6

To better understand the function of RT Cores, and what exactly they accelerate, we should first explain how ray tracing is performed on GPUs or CPUs without a dedicated hardware ray tracing engine. Essentially, the process of BVH traversal would need to be performed by shader operations and take thousands of instruction slots per ray cast to test against bounding box intersections in the BVH until finally hitting a triangle and the color at the point of intersection contributes to the final pixel color (or if no triangle is hit, the background color may be used to shade a pixel).

Ray tracing without hardware acceleration requires thousands of software instruction slots per ray to test successively smaller bounding boxes in the BVH structure until possibly hitting a triangle. It’s a computationally-intensive process making it impossible to do on GPUs in real-time without hardware-based ray tracing acceleration.

The RT Cores in Ampere can process all the BVH traversal and ray-triangle intersection testing, saving the SM from spending the thousands of instruction slots per ray, which could be an enormous amount of instructions for an entire scene. The RT Core includes two specialized units. The first unit does bounding box tests, and the second unit does ray-triangle intersection tests.

The SM only has to launch a ray probe, and the RT core does the BVH traversal and ray-triangle tests, and return a hit or no hit to the SM. Also unlike the last generation, Ampere SM can process two compute workloads simultaneously, allowing ray-tracing & graphics/compute workloads to be done concurrently.

In a visual demonstration, NVIDIA has shown how RT and Tensor cores help speed up ray tracing and shader workloads significantly. A fully ray-traced frame from Wolfenstein Youngblood was taken as an example. The last-gen RTX 2080 SUPER will take 51ms to render the frame if it does it all with its shaders (CUDA Cores). With RT cores and shaders working in tandem, the processing times are reduced to just 20ms or less than half the time. Adding in Tensor cores to help reduce the rendering time even lower to just 12ms (~83 FPS).

However, with Ampere, each standard processing block receives a huge performance uplift. With an RTX 3080, the same frame can be rendered within 37ms on the Shader cores alone, 11ms with the RT+Shader cores, and 6.7ms (150 FPS) with all three core technologies working together. That's half the time of what Turing took to render the same scene.

You can find additional information about our hardware review process and ethics policy here.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA GeForce RTX 3080 10 GB “Ampere” Graphics Card Review

NVIDIA GeForce RTX 3080 10 GB “Ampere” Graphics Card Review

NVIDIA GeForce RTX 3080

Type

Price

NVIDIA Ampere GPU - 2nd Gen RT and 3rd Gen Tensor Cores Deep Dive

Related Story Get This RTX 5070 For Just $579 At A Time When The GPU Sells For Over $650

2nd Gen RT Cores, RTX, and Real-Time Ray Tracing Dissected

Contents

Further Reading

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

NVIDIA CEO Rips Apart The “Chip Delay” Narrative, Says “Giant Amounts” of Vera Rubin Coming & Unveils Japan’s First AI Factory With 27,500 Rubin GPUs

CPUID Rolls Out HWMonitor v1.65.1, Removing Additional Hot Spot Temperature Reading

Palit Officially Brings Back RTX 3060 12 GB As The Budget Segment Continues To Suffer