NVIDIA GeForce GTX 1650 SUPER 4 GB GDDR6 Graphics Card Review – The Best SUPER Card Yet!
MSI GeForce GTX 1650 SUPER Gaming XNovember, 2019
NVIDIA GeForce GTX 1650 SUPER and Turing TU116 GPU
While we have already detailed the Turing GPU architecture, it should be pointed out that the TU116 GPU, while it shares the same DNA as the Turing architecture has some big changes to what've seen on the GeForce RTX 20 series cards.
Based on the 12th Generation Turing GPU architecture, the TU116 GPU found on the GeForce GTX 1660 Ti, GeForce GTX 1660 SUPER, GeForce GTX 1660 and the GeForce GTX 1650 SUPER features the same shader innovations that were introduced on Turing but to balance it out in terms of power, cost and performance, a few adjustments had to be made. This is done through the exclusion of RT cores and Tensor cores on the GeForce GTX cards with Turing architecture. It is pointed out that the Turing architecture on GeForce GTX still delivers improved performance & better efficiency compared to its predecessor while supporting concurrent floating-point and integer Ops.
So let's talk about the balanced architecture design of the Turing TU116 and how it still manages to improve upon its Pascal-based predecessors. The first thing to mention is the three big changes in the Turing SM. The revamped structure of the Turing TU116 SM enables the processing of FP32 & INT operations concurrently through the use of dedicated cores within the SM. The list of features that Turing TU116 GPU adds over Pascal GP106 include:
- Concurrent FP and INT operations
- Variable Rate Shading
- Unified Cache Architecture
- GDDR6 Memory Subsystem
- Dedicated FP16 Cores
- Turing NVENC Support
The Turing SM can also perform FP16 operations at double the rate of FP32. The Turing TU116 GPU is rated at 11 TOPs (FP+INT), 11 TFLOPs FP16 and an improved bandwidth that is resultant of the higher cache size of 1.5 MB compared to just 480 KB on the Pascal GP106 GPU.
If we look at some modern gaming titles, then we can see that developers are widely mixing floating-point operations with integer instructions. For every 100 instructions in Shadow of the Tomb Raider, for example, 62 are floating point and 38 integers, on average. In previous GPUs, the floating-point math datapath in the SM would sit idly whenever one of these non-FP-math runs.
Turing adds a second parallel Integer execution unit never to ever CUDA core that executes these instructions in parallel with floating-point math. This would allow the GeForce GTX 1650 SUPER graphics card to deliver up to 2.0x performance improvement over the GeForce GTX 1050 4 GB.
Now coming to the raw specifications of the GeForce GTX 1650 SUPER graphics card. The TU116 GPU is fabricated on the TSMC's 12nm FFN (FinFET NVIDIA) process node. It features 3 GPCs, 12 TPCs, and 20 Turing SMs. Each SM contains 64 cores which equal to a total of 1280 CUDA Cores. There are also 80 Texture Units and 32 Raster Operation Units on the card. The base clock is maintained at 1530 MHz while the boost clock is maintained at 1725 MHz. That's a massive GPU configuration difference versus the GeForce GTX 1650 which was based on the TU117 GPU core.
The card features 4 GB of GDDR6 VRAM running along a 128-bit bus interface. The memory system would be clocked at 12.0 Gbps delivering an effective bandwidth of 192 GB/s. The card features a single 6 pin connector to boot and has a TDP of 100W. Display outputs include a single DisplayPort, a single DVI-D, and an HDMI connector.
NVIDIA GeForce RTX/GTX "Turing" Family:
|Graphics Card Name||NVIDIA GeForce GTX 1650||NVIDIA GeForce GTX 1660||NVIDIA GeForce GTX 1660 SUPER||NVIDIA GeForce GTX 1660 Ti||NVIDIA GeForce RTX 2060||NVIDIA GeForce RTX 2070||NVIDIA GeForce RTX 2080||NVIDIA GeForce RTX 2080 Ti|
|GPU Architecture||Turing GPU (TU117)||Turing GPU (TU116)||Turing GPU (TU116)||Turing GPU (TU116)||Turing GPU (TU106)||Turing GPU (TU106)||Turing GPU (TU104)||Turing GPU (TU102)|
|Process||12nm FNN||12nm FNN||12nm FNN||12nm FNN||12nm FNN||12nm FNN||12nm FNN||12nm FNN|
|Transistors||4.7 Billion||6.6 Billion||6.6 Billion||6.6 Billion||10.6 Billion||10.6 Billion||13.6 Billion||18.6 Billion|
|CUDA Cores||896 Cores||1408 Cores||1408 Cores||1536 Cores||1920 Cores||2304 Cores||2944 Cores||4352 Cores|
|GigaRays||N/A||N/A||N/A||N/A||5 Giga Rays/s||6 Giga Rays/s||8 Giga Rays/s||10 Giga Rays/s|
|Cache||1.5 MB L2 Cache||1.5 MB L2 Cache||1.5 MB L2 Cache||1.5 MB L2 Cache||4 MB L2 Cache||4 MB L2 Cache||4 MB L2 Cache||6 MB L2 Cache|
|Base Clock||1485 MHz||1530 MHz||1530 MHz||1500 MHz||1365 MHz||1410 MHz||1515 MHz||1350 MHz|
|Boost Clock||1665 MHz||1785 MHz||1785 MHz||1770 MHz||1680 MHz||1620 MHz|
1710 MHz OC
1800 MHz OC
1635 MHz OC
|Compute||3.0 TFLOPs||5.0 TFLOPs||5.0 TFLOPs||5.5 TFLOPs||6.5 TFLOPs||7.5 TFLOPs||10.1 TFLOPs||13.4 TFLOPs|
|Memory||Up To 4 GB GDDR5||Up To 6 GB GDDR5||Up To 6 GB GDDR6||Up To 6 GB GDDR6||Up To 6 GB GDDR6||Up To 8 GB GDDR6||Up To 8 GB GDDR6||Up To 11 GB GDDR6|
|Memory Speed||8.00 Gbps||8.00 Gbps||14.00 Gbps||12.00 Gbps||14.00 Gbps||14.00 Gbps||14.00 Gbps||14.00 Gbps|
|Memory Bandwidth||128 GB/s||192 GB/s||336 GB/s||288 GB/s||336 GB/s||448 GB/s||448 GB/s||616 GB/s|
|Power Connectors||N/A||8 Pin||8 Pin||8 Pin||8 Pin||8 Pin||8+8 Pin||8+8 Pin|
|Starting Price||$149 US||$219 US||$229 US||$279 US||$349 US||$499 US||$699 US||$999 US|
|Price (Founders Edition)||$149 US||$219 US||$229 US||$279 US||$349 US||$599 US||$799 US||$1,199 US|
|Launch||April 2019||March 2019||October 2019||February 2019||January 2019||October 2018||September 2018||September 2018|