Analysis Hardware PC

NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

Hassan Mujtaba • Sep 12, 2018 at 11:52am EDT

New features of the NVIDIA Turing GPU architecture have been revealed and detailed by the folks over at Videocardz. The new details show how the Turing GPUs are a huge departure from current GeForce graphics cards based on the Pascal GPU architecture and the techniques NVIDIA is using to deliver the best performance to end users and gamers.

NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed - More Core Performance, Better Memory Compression, and New Features For Gamers

Starting with the most significant part of the Turing GPU architecture, the Turing SM, we are seeing an entirely new graphics core. The Turing SM is made up of a combination of INT32, FP32, and the new Tensor cores. Each SM has 96 KB of L1 cache which is shared across the entire GPU. There are four warp schedulers and dispatchers inside a Turing GPU and similarly, there are four register file units.

Coming to the new execution units or cores, Turing has both INT32 and FP32 units. Each SM has 64 each and 8 Tensor cores. This new architectural design allows Turing to execute floating point and non-floating point operations in parallel which allows for up to 36% higher throughput in standard floating point operations. The entire SM works in harmony by using different blocks to deliver high performance and better texture caching, enabling for up to 50% better CUDA core performance when compared to the previous generation.

Following is a shot of the Turing SM by Videocardz:

NVIDIA GeForce RTX/GTX "Turing" Family:

Graphics Card Name	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1650 D6	NVIDIA GeForce GTX 1650	NVIDIA GeForce GTX 1660	NVIDIA GeForce GTX 1660 SUPER	NVIDIA GeForce GTX 1660 Ti	NVIDIA GeForce RTX 2060	NVIDIA GeForce RTX 2070	NVIDIA GeForce RTX 2080	NVIDIA GeForce RTX 2080 Ti
GPU Architecture	Turing GPU (TU117)	Turing GPU (TU117)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU116)	Turing GPU (TU106)	Turing GPU (TU106)	Turing GPU (TU104)	Turing GPU (TU102)
Process	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN	12nm FNN
Die Size	200mm2	200mm2	284mm2	284mm2	284mm2	284mm2	445mm2	445mm2	545mm2	754mm2
Transistors	4.7 Billion	4.7 Billion	6.6 Billion	6.6 Billion	6.6 Billion	6.6 Billion	10.6 Billion	10.6 Billion	13.6 Billion	18.6 Billion
CUDA Cores	896 Cores	896 Cores	1280 Cores	1408 Cores	1408 Cores	1536 Cores	1920 Cores	2304 Cores	2944 Cores	4352 Cores
TMUs/ROPs	56/32	56/32	80/32	88/48	88/48	96/48	120/48	144/64	192/64	288/96
GigaRays	N/A	N/A	N/A	N/A	N/A	N/A	5 Giga Rays/s	6 Giga Rays/s	8 Giga Rays/s	10 Giga Rays/s
Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	1.5 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	4 MB L2 Cache	6 MB L2 Cache
Base Clock	1485 MHz	1410 MHz	1530 MHz	1530 MHz	1530 MHz	1500 MHz	1365 MHz	1410 MHz	1515 MHz	1350 MHz
Boost Clock	1665 MHz	1590 MHz	1725 MHz	1785 MHz	1785 MHz	1770 MHz	1680 MHz	1620 MHz 1710 MHz OC	1710 MHz 1800 MHz OC	1545 MHz 1635 MHz OC
Compute	3.0 TFLOPs	3.0 TFLOPs	4.4 TFLOPs	5.0 TFLOPs	5.0 TFLOPs	5.5 TFLOPs	6.5 TFLOPs	7.5 TFLOPs	10.1 TFLOPs	13.4 TFLOPs
Memory	Up To 4 GB GDDR5	Up To 4 GB GDDR6	Up To 4 GB GDDR6	Up To 6 GB GDDR5	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 6 GB GDDR6	Up To 8 GB GDDR6	Up To 8 GB GDDR6	Up To 11 GB GDDR6
Memory Speed	8.00 Gbps	12.00 Gbps	12.00 Gbps	8.00 Gbps	14.00 Gbps	12.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps	14.00 Gbps
Memory Interface	128-bit	128-bit	128-bit	192-bit	192-bit	192-bit	192-bit	256-bit	256-bit	352-bit
Memory Bandwidth	128 GB/s	192 GB/s	192 GB/s	192 GB/s	336 GB/s	288 GB/s	336 GB/s	448 GB/s	448 GB/s	616 GB/s
Power Connectors	N/A	N/A	6 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8 Pin	8+8 Pin	8+8 Pin
TDP	75W	75W	100W	120W	125W	120W	160W	185W (Founders) 175W (Reference)	225W (Founders) 215W (Reference)	260W (Founders) 250W (Reference)
Starting Price	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$499 US	$699 US	$999 US
Price (Founders Edition)	$149 US	$149 US	$159 US	$219 US	$229 US	$279 US	$349 US	$599 US	$799 US	$1,199 US
Launch	April 2019	April 2020	November 2019	March 2019	October 2019	February 2019	January 2019	October 2018	September 2018	September 2018

The Turing GPUs Dissected - TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070

NVIDIA is for the first time not only launching the **80 and **70 cards along with the flagship **80 Ti model but they are also launching graphics cards with three different GPUs. While the GPUs are similar in design, the configurations are very different and one thing we can tell is that the configs leave a lot of room for NVIDIA to expand upon in the future if they want to.

What I mean to say is that the RTX 2080 Ti isn't based on the full TU102 GPU, the RTX 2080 is also not based on the full TU104 GPU while the RTX 2070 is the only card that utilizes the full config of the GPU its based upon, the Turing TU106.

One more thing, these GPUs are really huge in terms of die size compared to the Pascal GPU, while using the 12nm process. The reason being the added INT32 execution units and Tensor cores which weren't available on any previous consumer based GeForce graphics cards. Hence, the TU106 GPU which succeeds the GP106 GPU is over twice as large as its predecessor (445mm2 versus 200mm2).

Here's another thing, the GP106 was used in the GTX 1060 which is more of a mainstream graphics card. However, while the RTX 2070 rocks a TU106 GPU which may make it look like a mainstream GPU with a much higher price tag, it does have overall better specifications compared to the GP104 based GTX 1070 with higher cores, better memory, and more features. It also has around twice as many cores as the GTX 1060 so calling it a mainstream graphics card won't be a wise choice.

NVIDIA Turing TU102 GPU

So overall, the TU102 is made up of 6 graphics processing clusters with 6 SM units on each cluster. That makes up 36 SM units for a total of 4608 Cores in an 18.6 billion transistor package measuring 775mm2.

NVIDIA Turing TU104 GPU

The TU104 is made up of 6 graphics processing clusters with 4 SM units on each cluster. That makes up 24 SM units for a total of 3072 cores in a 13.6 billion transistor package measuring 545mm2.

NVIDIA Turing TU106 GPU

The TU106 is made up of 3 Graphics processing clusters with 6 SM units on each cluster. That makes up 18 SM units for a total of 2304 Cores in a 10.6 billion transistor package measuring 445mm2.

NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs

In terms of shading performance which is the direct result of the enhanced core design and GPU architecture revamp, the Turing GPU offers an average uplift of 50% better performance per core compared to Pascal GPUs. In VR games, the shading performance would be a good 2x ahead than what Pascal achieved while many modern gaming titles show a ~50% lead over Pascal with Turing’s enhanced core design.

It should be pointed that these are just per core performance gains at the same clock speeds without adding the benefits of other technologies that Turing comes with. That would further increase the performance in a wide variety of gaming applications as we have already seen the gaming performance of a GeForce RTX 2080 to be 50% faster than the GTX 1080 on average and twice as fast with the new DLSS technology.

NVIDIA is also incorporating new shading models, one of which is known as Mesh Shading that would significantly help games process vertex, tesselation, and geometry shading:

Mesh Shading — new shader model for vertex, tesselation, geometry shading (more objects per scene)
Variable Rate Shading (VRS) — developer control over shading rates (to limit shading where it does not provide visual benefit)
Texture-Space Sharing — Storing shading results in memory (no need to duplicate sharing work for the processes)
Multi-View Rendering (MVR) — Extends Pascal’s Single Pass Stereo to multi-views in a single pass

NVIDIA Turing GPUs With Better Memory Compression - Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

One of the key improvements of Pascal over Maxwell was the faster memory compression algorithms which delivered very high bandwidth by using various compression and caching techniques.

With Turing, we are looking at the third generation of memory compression architecture which is said to effectively deliver up to 50% boost in effective bandwidth when compared to Pascal GPUs. We know that the Pascal GeForce GTX 1080 Ti memory bandwidth was boosted to 1.2 TB/s over the raw 484.4 GB/s bandwidth when using these algorithms and with Turing, NVIDIA is saying that we should expect 50% more effective bandwidth with Memory Compression 3.0.

Since Turing GPU already have higher raw bandwidth compared to Pascal GPUs (RTX 2080 Ti with 616 GB/s), we can expect the effective bandwidth using the new algorithm to reach past 1.5 TB/s which is very good considering it would help games deliver even better performance on higher resolutions which the graphics cards are aiming at.

NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder

The Turing GPUs featured on the GeForce RTX graphics cards also come with new display capabilities. The highlight of them may be the VirtualLink USB Type-C port but there's also DisplayPort 1.4a, both of which enable 8K at 60 Hz.

The cards will also be equipped with an enhanced NVENC encoder and decoder that can encode H.265 streams at 8K/30 FPS and decode with HEV YUV444 10/12bit HDR. H.264 8K and support HDR (VP9 10/12).

The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September

The NVIDIA GeForce RTX 20 series launches today in reference variants first. This time, NVIDIA has already given the green light to their manufacturers to announce custom cards soon after the reference launch which are now available to pre-order on the official GeForce webpage. Or you can head over to this article and check out all the glorious non-reference models which you will be able to get very soon.

nvidia_gamescom_2018_geforce_rtx_20_series_launch_25

nvidia_gamescom_2018_geforce_rtx_20_series_launch_24

The one thing we should tell is that the performance numbers are still under wraps till 19th August which leaves little or no time for consumers to reconsider the pre-orders since the availability is a day later or less than 24 hours. The reviews for the GeForce RTX 2080 Ti and RTX 2080 will go live on 19th September at the same time, but if you are planning to buy one, or already pre-ordered one, but going to reconsider your purchase, then you will have little to think.

Check out the other cards in the links below:

Which NVIDIA GeForce RTX 20 Series graphics card are you buying?

NVIDIA GeForce RTX 2080 TI
NVIDIA GeForce RTX 2080
NVIDIA GeForce RTX 2070
NVIDIA GeForce RTX 2060
NVIDIA GeForce RTX 2050

View Results

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed - More Core Performance, Better Memory Compression, and New Features For Gamers

NVIDIA GeForce RTX/GTX "Turing" Family:

The Turing GPUs Dissected - TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070

NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs

NVIDIA Turing GPUs With Better Memory Compression - Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder

The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September

Trending Stories

Kirin 9030 In-Depth Analysis Proves SMIC Can Create Denser SoCs Than Intel Has With Its 18A Node, But The Attributes That Require Improvements Are Left Out

Nintendo Doubles Down on Switch 2 Security, But Developer Gezine Cracks a Universal Exploit That Works Entirely Offline

NVIDIA’s Synthetic Video Detector Spots Fake News & AI-Generated Content With 92% Accuracy, Analyzing 1080p Footage In Just 22ms

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

TSMC’s CFO Admits US Fabs Cost Four To Five Times More Than Taiwan, Yet Doubles Down With $100B Bet

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal

NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed - More Core Performance, Better Memory Compression, and New Features For Gamers

Related Story NVIDIA’s Synthetic Video Detector Spots Fake News & AI-Generated Content With 92% Accuracy, Analyzing 1080p Footage In Just 22ms

NVIDIA GeForce RTX/GTX "Turing" Family:

The Turing GPUs Dissected - TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070

NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs

NVIDIA Turing GPUs With Better Memory Compression - Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s

NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder

The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September

Further Reading

Trending Stories

Popular Discussions