NVIDIA GeForce RTX Turing GPUs Detailed – GeForce RTX 2070 Features TU106, 50% Faster Per Core Performance, 50% Better Memory Compression Than Pascal
New features of the NVIDIA Turing GPU architecture have been revealed and detailed by the folks over at Videocardz. The new details show how the Turing GPUs are a huge departure from current GeForce graphics cards based on the Pascal GPU architecture and the techniques NVIDIA is using to deliver the best performance to end users and gamers.
NVIDIA Turing GPUs For GeForce RTX Graphics Cards Detailed – More Core Performance, Better Memory Compression, and New Features For Gamers
Starting with the most significant part of the Turing GPU architecture, the Turing SM, we are seeing an entirely new graphics core. The Turing SM is made up of a combination of INT32, FP32, and the new Tensor cores. Each SM has 96 KB of L1 cache which is shared across the entire GPU. There are four warp schedulers and dispatchers inside a Turing GPU and similarly, there are four register file units.
Coming to the new execution units or cores, Turing has both INT32 and FP32 units. Each SM has 64 each and 8 Tensor cores. This new architectural design allows Turing to execute floating point and non-floating point operations in parallel which allows for up to 36% higher throughput in standard floating point operations. The entire SM works in harmony by using different blocks to deliver high performance and better texture caching, enabling for up to 50% better CUDA core performance when compared to the previous generation.
Following is a shot of the Turing SM by Videocardz:
NVIDIA GeForce RTX/GTX "Turing" Family:
|Graphics Card Name||NVIDIA GeForce GTX 1650||NVIDIA GeForce GTX 1660||NVIDIA GeForce GTX 1660 Ti||NVIDIA GeForce RTX 2060||NVIDIA GeForce RTX 2070||NVIDIA GeForce RTX 2080||NVIDIA GeForce RTX 2080 Ti|
|GPU Architecture||Turing GPU (TU117)||Turing GPU (TU116)||Turing GPU (TU116)||Turing GPU (TU106)||Turing GPU (TU106)||Turing GPU (TU104)||Turing GPU (TU102)|
|Process||12nm FNN||12nm FNN||12nm FNN||12nm FNN||12nm FNN||12nm FNN||12nm FNN|
|Transistors||4.7 Billion||6.6 Billion||6.6 Billion||10.6 Billion||10.6 Billion||13.6 Billion||18.6 Billion|
|CUDA Cores||896 Cores||1408 Cores||1536 Cores||1920 Cores||2304 Cores||2944 Cores||4352 Cores|
|GigaRays||N/A||N/A||N/A||5 Giga Rays/s||6 Giga Rays/s||8 Giga Rays/s||10 Giga Rays/s|
|Cache||1.5 MB L2 Cache||1.5 MB L2 Cache||1.5 MB L2 Cache||4 MB L2 Cache||4 MB L2 Cache||4 MB L2 Cache||6 MB L2 Cache|
|Base Clock||1485 MHz||1530 MHz||1500 MHz||1365 MHz||1410 MHz||1515 MHz||1350 MHz|
|Boost Clock||1665 MHz||1785 MHz||1770 MHz||1680 MHz||1620 MHz
1710 MHz OC
1800 MHz OC
1635 MHz OC
|Compute||3.0 TFLOPs||5.0 TFLOPs||5.5 TFLOPs||6.5 TFLOPs||7.5 TFLOPs||10.1 TFLOPs||13.4 TFLOPs|
|Memory||Up To 4 GB GDDR5||Up To 6 GB GDDR5||Up To 6 GB GDDR6||Up To 6 GB GDDR6||Up To 8 GB GDDR6||Up To 8 GB GDDR6||Up To 11 GB GDDR6|
|Memory Speed||8.00 Gbps||8.00 Gbps||12.00 Gbps||14.00 Gbps||14.00 Gbps||14.00 Gbps||14.00 Gbps|
|Memory Bandwidth||128 GB/s||192 GB/s||288 GB/s||336 GB/s||448 GB/s||448 GB/s||616 GB/s|
|Power Connectors||N/A||8 Pin||8 Pin||8 Pin||8 Pin||8+8 Pin||8+8 Pin|
|Starting Price||$149 US||$219 US||$279 US||$349 US||$499 US||$699 US||$999 US|
|Price (Founders Edition)||$149 US||$219 US||$279 US||$349 US||$599 US||$799 US||$1,199 US|
|Launch||April 2019||March 2019||February 2019||January 2019||October 2018||September 2018||September 2018|
The Turing GPUs Dissected – TU102 For RTX 2080 Ti, TU104 For RTX 2080 and TU106 For RTX 2070
NVIDIA is for the first time not only launching the **80 and **70 cards along with the flagship **80 Ti model but they are also launching graphics cards with three different GPUs. While the GPUs are similar in design, the configurations are very different and one thing we can tell is that the configs leave a lot of room for NVIDIA to expand upon in the future if they want to.
What I mean to say is that the RTX 2080 Ti isn’t based on the full TU102 GPU, the RTX 2080 is also not based on the full TU104 GPU while the RTX 2070 is the only card that utilizes the full config of the GPU its based upon, the Turing TU106.
One more thing, these GPUs are really huge in terms of die size compared to the Pascal GPU, while using the 12nm process. The reason being the added INT32 execution units and Tensor cores which weren’t available on any previous consumer based GeForce graphics cards. Hence, the TU106 GPU which succeeds the GP106 GPU is over twice as large as its predecessor (445mm2 versus 200mm2).
Here’s another thing, the GP106 was used in the GTX 1060 which is more of a mainstream graphics card. However, while the RTX 2070 rocks a TU106 GPU which may make it look like a mainstream GPU with a much higher price tag, it does have overall better specifications compared to the GP104 based GTX 1070 with higher cores, better memory, and more features. It also has around twice as many cores as the GTX 1060 so calling it a mainstream graphics card won’t be a wise choice.
NVIDIA Turing TU102 GPU
So overall, the TU102 is made up of 6 graphics processing clusters with 6 SM units on each cluster. That makes up 36 SM units for a total of 4608 Cores in an 18.6 billion transistor package measuring 775mm2.
NVIDIA Turing TU104 GPU
The TU104 is made up of 6 graphics processing clusters with 4 SM units on each cluster. That makes up 24 SM units for a total of 3072 cores in a 13.6 billion transistor package measuring 545mm2.
NVIDIA Turing TU106 GPU
The TU106 is made up of 3 Graphics processing clusters with 6 SM units on each cluster. That makes up 18 SM units for a total of 2304 Cores in a 10.6 billion transistor package measuring 445mm2.
NVIDIA Turing GPU Packs 50% Better Performance Per Core Than Pascal GPUs
In terms of shading performance which is the direct result of the enhanced core design and GPU architecture revamp, the Turing GPU offers an average uplift of 50% better performance per core compared to Pascal GPUs. In VR games, the shading performance would be a good 2x ahead than what Pascal achieved while many modern gaming titles show a ~50% lead over Pascal with Turing’s enhanced core design.
It should be pointed that these are just per core performance gains at the same clock speeds without adding the benefits of other technologies that Turing comes with. That would further increase the performance in a wide variety of gaming applications as we have already seen the gaming performance of a GeForce RTX 2080 to be 50% faster than the GTX 1080 on average and twice as fast with the new DLSS technology.
NVIDIA is also incorporating new shading models, one of which is known as Mesh Shading that would significantly help games process vertex, tesselation, and geometry shading:
- Mesh Shading — new shader model for vertex, tesselation, geometry shading (more objects per scene)
- Variable Rate Shading (VRS) — developer control over shading rates (to limit shading where it does not provide visual benefit)
- Texture-Space Sharing — Storing shading results in memory (no need to duplicate sharing work for the processes)
- Multi-View Rendering (MVR) — Extends Pascal’s Single Pass Stereo to multi-views in a single pass
NVIDIA Turing GPUs With Better Memory Compression – Effective Memory Bandwidth Increased Up To 50% Over Pascal GPUs, Over 1.5 TB/s
One of the key improvements of Pascal over Maxwell was the faster memory compression algorithms which delivered very high bandwidth by using various compression and caching techniques.
With Turing, we are looking at the third generation of memory compression architecture which is said to effectively deliver up to 50% boost in effective bandwidth when compared to Pascal GPUs. We know that the Pascal GeForce GTX 1080 Ti memory bandwidth was boosted to 1.2 TB/s over the raw 484.4 GB/s bandwidth when using these algorithms and with Turing, NVIDIA is saying that we should expect 50% more effective bandwidth with Memory Compression 3.0.
Since Turing GPU already have higher raw bandwidth compared to Pascal GPUs (RTX 2080 Ti with 616 GB/s), we can expect the effective bandwidth using the new algorithm to reach past 1.5 TB/s which is very good considering it would help games deliver even better performance on higher resolutions which the graphics cards are aiming at.
NVIDIA Turing GPUs With Display Port 1.4a, Enhanced NVENC Encoder/Decoder
The Turing GPUs featured on the GeForce RTX graphics cards also come with new display capabilities. The highlight of them may be the VirtualLink USB Type-C port but there’s also DisplayPort 1.4a, both of which enable 8K at 60 Hz.
The cards will also be equipped with an enhanced NVENC encoder and decoder that can encode H.265 streams at 8K/30 FPS and decode with HEV YUV444 10/12bit HDR. H.264 8K and support HDR (VP9 10/12).
The GeForce RTX 20 Series Market Availability – Preorder and Shipping Today, On Shelves 20th September
The NVIDIA GeForce RTX 20 series launches today in reference variants first. This time, NVIDIA has already given the green light to their manufacturers to announce custom cards soon after the reference launch which are now available to pre-order on the official GeForce webpage. Or you can head over to this article and check out all the glorious non-reference models which you will be able to get very soon.
The one thing we should tell is that the performance numbers are still under wraps till 19th August which leaves little or no time for consumers to reconsider the pre-orders since the availability is a day later or less than 24 hours. The reviews for the GeForce RTX 2080 Ti and RTX 2080 will go live on 19th September at the same time, but if you are planning to buy one, or already pre-ordered one, but going to reconsider your purchase, then you will have little to think.
Check out the other cards in the links below:
- GeForce RTX 2080 Ti (999 US) Graphics Card
- GeForce RTX 2080 ($699 US) Graphics Card
- GeForce RTX 2070 ($499 US) Graphics Card