Today, NVIDIA finally unleashes their high-performance GeForce GTX 980 and GeForce GTX 970 graphics card. NVIDIA's latest graphics cards feature the second generation Maxwell architecture which is the most advanced GPU ever built by NVIDIA featuring great performance and delivering higher efficiency in terms of power input. The Maxwell architecture revolutionizes the graphics industry setting new standards for NVIDIA gamers and fans.
NVIDIA Unleashes Maxwell GM204 Based GeForce GTX 980 and GeForce GTX 970 Graphics Cards
We have waited for several years for Maxwell to hit the market. While NVIDIA initially released the first core as the GM107 which was based on the first generation architecture design, the GM204 is based on the latest and improved, second generation Maxwell core architecture which adopts some new technologies. So before we go talk about the cards, let's take a recap of the architectural details we have come to know about GM204.
NVIDIA GM204 GPU Architecture
The GM204 is the heart of the next generation GeForce GTX 980 and GeForce GTX 970 graphics cards. The chip makes use of the second generation Maxwell core architecture that has faster per core performance than first generation Maxwell based chips (GM107) which were released with the GeForce GTX 750 and GeForce GTX 750 Ti graphics cards and has several new features which deliver better performance and great power efficiency making GeForce GTX 980 one of the most efficient flagship offering in history. NVIDIA has changed since their Kepler generation of cards. Before Kepler, NVIDIA was known to release cards which ran hot and consumed a ton of power and the failure rates of the previous generation cards were pretty high. Though, NVIDIA did manage to release some great cards over the period, the G80 based GeForce 8000 GTX and the price/performance king, GTX 460 are still considered one of the greatest NVIDIA cards that came to market.
Kepler changed certain things, NVIDIA moved away from their branding scheme where users were able to buy HPC chips rebranded for the GeForce audience. The first Kepler GPU, the GK104 was branded as the GeForce card and while it was fast, it wasn't the fastest compared to another chip which NVIDIA had in their hands for over a year. I am talking about the GK110 which was geared towards the professional market such as the Tesla super computer. The GK110 did launch a year later but it since then, NVIDIA has configured their core lineup to span two generations, one with the gaming minded chip and the follow up would be the full fledged HPC chip. There's a reason for NVIDIA to advertise the GeForce GTX Titan Z as a professional and gaming card even though it clearly has a GeForce name in its branding. The Titan Z makes use of two GK110 chips which are the compute crunching beasts compared to the GK104 which focused on gaming features by excluding all the non essential features such as compute. So while the GM104 based GTX 980 will obviously replace GK110 based GTX 780 Ti in branding, the real comparison in branding should be GM104 versus the GK104. Regardless of this, the GTX 980 is a superb card which beats GK110 on an existing process node.
This is all achieved with the 28nm process node so one can imagine the numbers we can expect when NVIDIA hops to an even lower process in the future. Alright, so the GM204 has two variants, the GM204-400 which is fused on the GeForce GTX 980 and the GM204-200 which is fused on the GeForce GTX 970. The fully enabled GM204 chip features 4 GPC (Graphics Processing Clusters) which feature four SMM blocks each. These blocks include four logic units each which consist of 32 cores so in total, a single SMM unit results in 128 Cores while the 16 blocks available on the GM204-400 chip equate to 2048 CUDA Cores. The GM204-200 has three less SMM units which result in a lower core count of 1664 thus making it around as fast as the GeForce GTX 780 while the GTX 980 will tackle the GeForce GTX 980 with a good 15-20% performance lead.
The most critical details of the chip are the transistor number and we all remember that the GK110 chip was a performance and computing beast at 7.08 Billion transistors while the GK104 included 3.54 Billion transistors. The GM204 includes 5.2 Billion transistors crammed inside a die that measures around 398 mm2 just 2 mm2 shy of 400mm2. The GK104 and GK110 measure at around 294 mm2 and 581 mm2 respectively. The die size has been increased a lot compared to GK104 and that’s the generational predecessor of the card. The GK110 will be replaced by GM200 but that is far from launch at the moment but NVIDIA has managed to include more on the 28nm process yet keeping the power consumption at just 165W on the GTX 980 and 148W for the GTX 970 which is simply mind boggling.
The GM204 GPU features 128 texture mapping units which was the standard amount featured on the GK104 but the raster operation units have been upped from 32 on GTX 680 and 48 on GTX 780 Ti to 64 on the GTX 980 graphics card. This is actually a larger update than GK110 but the GK110 does come with a very high TMU count of 240. NVIDIA compensates this by clocking the GM204 chip hence resulting in a higher per clock performance output when it comes to texture fill rate. Maxwell was also meant to improve the way GPU handles bandwidth and they are limiting the bandwidth dependancy of their cards by adding more cache of 2 MB which is 512 KB L2 more than GK110. The GK104 had just 256 KB of L2 cache so a major update there.
The theoretical compute of the chip in single precision would be rated around 4.6 TFLOPs which is really close to the GK110 which pumps out 5.1 TFLOps while the 1144 GT/s texture fill rate is a bit low but the pixel fillrate is considerably higher at 72.1 GP/s compared to 53.3 GP/s on GTX 780 Ti.
NVIDIA has some new software side enhancements through the hardware implemented in Maxwell which include Dynamic Super Resolution which is basically a second version of down sampling that functions to increase video quality at 1080P that matches 4K resolution. There’s also Delta Color Compression which is similar to the color compression we saw on AMD’s Tonga that compresses images to a lossless format so that overall quality is maintained allowing the GPU core to read and write the compressed data easily. A more refined version which saves images in local memory to be used later on to increase memory efficiency is used in Maxwell.
Then there’s Multi-Pixel Programming Sampling technology which improves randomization of each sample and reduces quantification artifacts for better geometry processing and anti aliasing filtering. An update on the display side is that GeForce GTX 980 adopts the HDMI 2.0 standard which goes in well with the new display standard of three Display Ports 1,2, 1 DVI, 1 HDMI outputs set by NVIDIA for their flagship offering.
NVIDIA GM204 SMM Unit Block Diagram:
The NVIDIA SMM or SM (Streaming Multiprocessor Maxwell) units are a update over the Kepler SMX. Each logic unit is split into four parts consisting of 32 cores, each of the SM unit houses 128 cores. The 128 core count is lower than the 192 Cores featured on the SMX unit on Kepler but do note that the Maxwell second generation cores are a good 40+ faster than Kepler cores. The new design also simplifies the architecture and the overall scheduling resulting in a considerable drop in power consumption and delays.
NVIDIA Maxwell GM204 GPU Energy and Memory Bandwidth Efficiency
Probably one of the most major talks surrounding the Maxwell cards were their low memory bus compared to their GK110 based predecessors. The slide posted below clearly shows that due to a new and improved ram architecture, NVIDIA has enhanced the bandwidth efficiency where 7.0 Gbps DRAM can deliver an effective throughput of 9.3 Gbps in gaming. Hence even with lower bandwidth, the entire need of the available band width has gone considerably down which results in better performance throughput and utilization.
On the other hand, the performance numbers of Maxwell just keep on getting better and better with up to 3 times the energy efficiency of Kepler. Note that while people will think that NVIDIA should have compensated power for more performance, the actual fact is that the card performs good and the lower TDP results in higher stability and overclocking numbers from non-reference and custom variants. The Maxwell architecture will also scale down from Tegra chips all the way to the top end GM200 based HPC parts so energy efficiency does matter.
NVIDIA Maxwell Technology and Features
NVIDIA is not only introducing a new core architecture but along with it several new technologies. There are six key updates to Maxwell that enable new algorithms and superior image quality compared to previous released cards.
The NVIDIA Maxwell core architecture adds the new tiled resources and multi-projection technology for voxel grids (future VXGI) which enhances global illumination. The DirectX 11.2 API makes use of 3D Tiled Resources that allows hardware managed virtual memory for the graphics processing unit and has several Tier-2 features supported such as Shader LOD clamp and mapped status feedback, mini/max reduction filtering and reads from non-mapped title returns 0.
Conservative Raster Technology:
First up, we have conservative raster technology which improves voxeliazation, improving the accuracy of voxel coverage calculation. A mapped path of pixels will be covered if they are already covered by a triangle which is the conservative raster enabler which notices both orange and purple colors and covers them conserving the time it requires for calculation. This enables new rendering algorithms and the result of this voxelization tech improves performance by three times with the new hardware enabled acceleration available on Maxwell.
MFAA or Multi-Framed Sampled Anti-Aliasing Algorithm:
NVIDIA has been ahead in the anti-aliasing game for some time releasing new algorithms each passing generation. Their recent updates include MLAA, FXAA, TXAA and now, NVIDIA introduces the latest MFAA (Multi-Framed Sample Anti-Aliasing) technology which is an ultra efficient anti aliasing software design that delivers 30% more performance and the same quality as 4xMSAA.
NVIDIA Dynamic Super Resolution - 4K Quality on a 1080P Display:
One of the new features Maxwell supports in DSR or Dynamic Super Resolution. You can call it a new version of down sampling which has become a trend in PC gaming. The technology is enabled on GeForce 900 series cards only and can be enabled through GeForce Experience (set to enabled by default). The main purpose of down sampling is to deliver higher resolution quality down scaled to a smaller resolution monitor. So regardless of your monitor size, it can display superior image quality than what it's built to show as standard.
NVIDIA Flex, Gameworks, FlameWorks, HairWorks, GodRays Technologies:
NVIDIA’s Flex is the latest unified GPU PhysX system which allows developers to use a combination of rigid body and fluid simulations. In past game development processes, it was hard to let the two simulations work aside each other due to their complex nature but NVIDIA’s Flex with the right tools would unify this process allowing the use of both rigid body and fluid simulations.
Next up is the new GI Works SDK which is the short term for Global Illumination Works which allows real-time global illumination in any scene required. Currently, developers use pre-backed global illumination effects in their scenes placing several light sources in a particular place which is a burden for developers and at the same time, it gives off a non-dynamic presentation. This is solved with the use of real-time global illumination which is more realistic and offers a more dynamic experience to gamers.
Last up is the Flame Works SDK which includes a film-quality volumetric effect solution to render flame and smoke. NVIDIA is adding these features along with various other effects in alot of upcoming titles such as Batman: Arkham Origins, Witcher 3: The Wild Hunt, Assassins Creed IV: Black Flag, Watch Dogs. Some of the new titles such as Project Cars and the multi-million dollar funded Star Citizen are also offering rich NVIDIA Turbulence and NVIDIA PhysX and PhysX Particles support plus HBAO+, TXAA, Cloth Simulation and many more to name.
NVIDIA also showcased several slides during the event at their conference at GDC 2014 which confirm that their next generation FleX Unified PhysX and Turbulence particle effects are officially headed for PC and would be inte grated in Unreal Engine 4 and CryEngine. The Turbulence particles will be added to Unreal Engine 3, 4 and Cry Engine via a patch while FleX would be headed to Unreal Engine 4. Only PC is the supported platform for these new features so titles developed exclusively for PC or multi-plat titles which are optimized for PC will adopt the new features.
NVIDIA GeForce GTX 980
The Flagship GeForce 900 Maxwell
The NVIDIA GeForce GTX 980 is the flagship GeForce 900 series offering and the fastest Maxwell card to launch in the market. From top to bottom, the GeForce GTX 980 is a well built card featuring better performance, low power consumption and several new gaming and architecture side enhancements. The NVIDIA GeForce GTX 980 include 2048 CUDA Cores, 128 TMUs, 64 ROPs. The core clock is maintained at 1126 MHz core and 1216 MHz boost while the memory is clocked in at 7 GHz effective clock which results in 224 GB/s bandwidth. The TDP of the card is set at 165W while the power is fed through dual 6-Pin power connectors.
The GeForce GTX 980 is making use of an update revision of the NVTTM cooler introduced on the GeForce GTX Titan Black with a all black naming logo etched on the shroud near the I/O plate and a all black heatsink which can be spotted from the mirror cut out in the center of the shroud. The card obviously makes of vapor chamber which is cooler off by a blower fan. We were unable to find the Dual Axial fan design which NVIDIA had patented back a few months and was rumored to be a part of the new graphics card series but I expect the card even as it is will do a great job cooling the card considering it can dissipate heat of up to 275W while GeForce GTX 980 will have a maximum thermal dissipation power of just under 170W. So that’s a ton of cooling being supplied to the core and we can expect massive overclocking headroom for a card which is already clocked past the 1216 MHz barrier.
Back to the cooler design, the NVTTM does include some minor changes along the display ports isolating it inside the shroud entirely. One of the changes I like the most is the addition of the backplate which is carried over from the GeForce GTX Titan Z. The card features two SLI Gold fingers which will allow 4-Way SLI Multi GPU functionality. The GeForce GTX 980 is fed power through dual 6-Pin connectors and while there is space for an 8-Pin connector, NVIDIA will just feature two 6-Pin as a reference design leaving its AIB partners to do the rest in the form of custom designs. Display outputs include DVI, HDMI and three display ports which is one reason for the unusually large size of the display connector. The bracket is also updated with a new layout since the cut outs for exhaust look similar to the ones featured on the GeForce GTX Titan Z.
The PCB has been modified to a more brute design, NVIDIA can be seen using eight Samsung K4G41325FC-HC28 128M x 32. A total of eight of these modules have been featured which equate to 4 GB GDDR5 VRAM across a 256-bit bus. The voltage controller has been moved below the power connectors and the power delivery includes 5 Phases compared to 6 on the GeForce GTX 780 Ti. At the same time, we can see a large array of VRMs aside the chokes which will deliver unprecedented amount of overclocking performance even on the reference designs. The NVIDIA GeForce GTX 980 will retail at $549 US while non-reference models will retail at around $599 US pricing.
NVIDIA GeForce GTX 970
The $329 US and Sub-150W Maxwell
The NVIDIA GeForce GTX 970 is the most surprising part in the Maxwell lineup coming in at a price of just $329 US. NVIDIA’s GeForce GX 970 features 13 SMM units placed in 4 GPC (Graphics Processing Clusters). Since each SMM unit has 128 CUDA cores, 32 in each logic unit (32 x 4), the total number of CUDA cores equates to 1664 on the die. From the first generation Maxwell core architecture, we learned that a Maxwell SMM (Streaming Multiprocessor Maxwell) unit has 128 cores compared to 192 on the current generation Kepler SMX units. The specifications equate to a total of 1664 CUDA Cores, 104 TMUs and 64 ROPs.
Along with that, we have a 4 GB GDDR5 memory running across a 256-Bit memory interface clocked at 1753 MHz (7.00 GHz Effective) which pumps out 224.4 GB/s bandwidth. The core clock is maintained at 1051 MHz and 1178 MHz boost clock something which I was expecting if the cards were to be able to take on the GK110 core based graphics cards. Lastly, we have the fill rate numbers which amount to 33.6 GPixels/s Pixel and 145.0 GTexels/s Texture fill rates. The GeForce GTX 970 will be available in both reference and non-reference variants at launch which will retail at a range of $329 to $349 US. Display ports on the reference models will stick with the DVI, HDMI and three display ports. The card uses HDMI 2.0 technology and will be powered by dual 6-Pin connectors. AIB partners may offer different display output configurations but the cards would be fully compatible with G-Sync monitors.
NVIDIA Maxwell GeForce GTX 980 and GeForce GTX 970 Reviews:
- NVIDIA GeForce GTX 980 Review @ Anandtech
- NVIDIA GeForce GTX 980 Review @ Techpowerup
- NVIDIA GeForce GTX 980 Review @ Hardwarecanucks
- NVIDIA GeForce GTX 980 Review @ Guru3D
- NVIDIA GeForce GTX 980 Review @ HardOCP
- NVIDIA GeForce GTX 980 Review @ PCPer
- NVIDIA GeForce GTX 980 Review @ Bit-Tech
- NVIDIA GeForce GTX 980 Review @ Overclock3d
- NVIDIA GeForce GTX 980 Review @ TechReport
- NVIDIA GeForce GTX 980 Review @ Hexus
- NVIDIA GeForce GTX 980 Review @ MaximumPC
- NVIDIA GeForce GTX 980 Review @ Techspot
- NVIDIA GeForce GTX 980 Review @ Tweaktown
- NVIDIA GeForce GTX 980 and GeForce GTX 970 Review @ PCPOP
- NVIDIA GeForce GTX 970 Review @ Wccftech
NVIDIA GeForce GTX 970 and GTX 980 Specifications:
|GeForce GTX 570||GeForce GTX 580||GeForce GTX 670||GeForce GTX 680||GeForce GTX 770||GeForce GTX 780||GeForce GTX 780 Ti||GeForce GTX 970||GeForce GTX 980|
|SM Units||15 x 32||16 x 32||7 x 192||8 x 192||8 x 192||12 x 192||15 x 192||13 x 128||16 x 128|
|Core Clock||732 MHz||772 MHz||915 MHz||1006 MHz||1046 MHz||863 MHz||875 MHz||1051 MHz||1126 MHz|
|Boost Clock||1464 MHz||1544 MHz (Shader Clock)||980 MHz||1058 MHz||1085 MHz||900 MHz||928 MHz||1178 MHz||1216 MHz|
|Memory||1.2 GB GDDR5||1.5 GB GDDR5||2 GB GDDR5||2 GB GDDR5||2 GB GDDR5||3 GB GDDR5||3 GB GDDR5||4 GB GDDR5||4 GB GDDR5|
|Memory Clock||3.80 GB/s||4.0 GB/s||6.0 GHz||6.0 GHz||7.0 GHz||6.0 GHz||7.0 GHz||7.0 GHz||7.0 GHz|
|Memory Bandwidth||152.00 GB/s||192.4 GB/s||192.0 GB/s||192.0 GB/s||224.5 GB/s||288.6 GB/s||336.0 GB/s||224.5 GB/s||224.5 GB/s|
|Texture Fill Rate GT/s||43.92||49.41||102.5||128.8||134||166||210||145.0||TBC|
|Power Connectors||6+6 Pin||8+6 Pin||6+6 Pin||6+6 Pin||8+6 Pin||8+6 Pin||8+6 Pin||6+6 Pin||6+6 Pin|
|DirectX 12 Support||Yes||Yes||Yes||Yes||Yes||Yes||Yes||Yes||Yes|
|Launch||December 7th 2010||November 09 2010||May 10th 2012||March 22nd 2012||May 30th 2013||May 23rd 2013||December 2013||18th September 2014||18th September 2014|
|Price||$349 US||$499 US||$349 US||$499 US||$349 US||$499 US||$699 US||$329 Reference