NVIDIA Pascal GPU Analysis – An In-Depth Look at NVIDIA’s Next-Gen Graphics Cards Powering GeForce, Tesla and Quadro
At CES 2016, NVIDIA’s CEO, Jen-Hsun Huang presented the latest Drive PX 2 board that will be powered by the next generation Pascal GPU architecture. The Pascal GPU architecture is one which will be powering the next iteration of professional and consumer graphics cards, succeeding Maxwell and besting it in every possible way as is anticipated by enthusiasts and PC builders.
NVIDIA’s Pascal GPU Analysis – What To Expect From NVIDIA’s Next-Gen GPU Powerhouse
NVIDIA’s Pascal GPUs are not being launched any time soon but we know quite a lot about them from previous reports. NVIDIA provided us with a bit more details at their conference so let’s take a look at what’s Pascal all about. In 2014, NVIDIA introduced Maxwell, their last architecture to use the 28nm process node. We had seen 28nm on the GPU market since 2012 when AMD and NVIDIA launched their first products based on the (then latest) process tech, codenamed Kepler and GCN (1.0).
The Race To FinFET – What It Means For The GPU Industry
Over the years, this process was refined and we got to see some beefy designs such as the GK110, GM200 from NVIDIA and Hawaii, Fiji from AMD. Measuring up to 601mm2 (GM200) and integrating an insane amount of transistors (8.9 Billion on Fiji), the 28nm process proven to be a real deal for the graphics market as it served the market for a good four years time frame. But hardware and technology grows at a fast pace and a new node has long been demanded by GPU makers to build their next graphics chips.
As every generation of graphics card passes, we anticipate the successor to offer a great performance increase in the coming generation of graphics cards. When the industry shifted from 40nm to 28nm, we saw GPUs that were supposed to be aimed at mid-range offerings beating the big cores from the previous generation. The GTX 680, NVIDIA’s first 28nm graphics card obliterated the flagship GF110 core, featuring better performance and better power efficiency. The performance improvement was around 25% on a process that had just seen the light of day.
More than a year later, NVIDIA showed off just what kind of performance they had in their hands with the 28nm Kepler GPU. When the GTX 780 Ti launched, it featured more than 50% performance lead over the GTX 580. This was the moment where the flagship Kepler core got compared to the flagship Fermi core. It was known that NVIDIA had given priority to HPC for their compute-oriented Kepler cores which was the sole reason why we got to see GK104 as a flagship offering in 2012 in the first place. However, by this time, the 28nm node was fully learned and mastered by GPU companies.
When Maxwell and Fiji graphics cards came to the market, we saw a shift to gaming-only products rather than professional/HPC focused parts. The main reason for this shift was both NVIDIA and AMD knew that they had reached a certain bottleneck with the 28nm process where they could either go for a better performance in a single department (Gaming) or split it into two departments (Gaming/Compute) which would have resulted in worse efficiency and outrageously huge dies which they would have been selling at the fraction of their real cost to make the competitive against their own offerings. Result was GM200 and Fiji.
Both GPUs are great but they have something in common, they aren’t armed with the strong compute hardware which their older gen predecessors had (Hawaii/GK110). While they were efficient, their performance increases weren’t as big given the hardware updates they had received by the time. The Titan X was 30% faster than the GTX 780 Ti and the same could be said for the Fury X over R9 290X. While we once saw the mid-range GTX 680 delivering a nice 25% lead over GTX 580, the GTX 980 could only manage to deliver a 5-10% lead over the GTX 780 TI. By that time, it was clear that 28nm process had become a bottleneck and a new node was required by GPU manufacturers to experiment with and make next generation graphics processors.
|GPU Architecture||NVIDIA Fermi||NVIDIA Kepler||NVIDIA Maxwell||NVIDIA Pascal|
|GPU Process||40nm||28nm||28nm||16nm (TSMC FinFET)|
|GPU Design||SM (Streaming Multiprocessor)||SMX (Streaming Multiprocessor)||SMM (Streaming Multiprocessor Maxwell)||SMP (Streaming Multiprocessor Pascal)|
|Maximum Transistors||3.00 Billion||7.08 Billion||8.00 Billion||15.3 Billion|
|Maximum Die Size||520mm2||561mm2||601mm2||610mm2|
|Stream Processors Per Compute Unit||32 SPs||192 SPs||128 SPs||64 SPs|
|Maximum CUDA Cores||512 CCs (16 CUs)||2880 CCs (15 CUs)||3072 CCs (24 CUs)||3840 CCs (60 CUs)|
|FP32 Compute||1.33 TFLOPs(Tesla)||5.10 TFLOPs (Tesla)||6.10 TFLOPs (Tesla)||~12 TFLOPs (Tesla)|
|FP64 Compute||0.66 TFLOPs (Tesla)||1.43 TFLOPs (Tesla)||0.20 TFLOPs (Tesla)||~6 TFLOPs(Tesla)|
|Maximum VRAM||1.5 GB GDDR5||6 GB GDDR5||12 GB GDDR5||16 / 32 GB HBM2|
|Maximum Bandwidth||192 GB/s||336 GB/s||336 GB/s||720 GB/s - 1 TB/s|
|Launch Year||2010 (GTX 580)||2014 (GTX Titan Black)||2015 (GTX Titan X)||2016|
We have entered 2016 and now look upon FinFET process as that enabling technology that will help build fast and efficient GPUs. The FinFET process nodes are under development by TSMC, Samsung and Glofo (Global Foundries). GPU makers have the choice to select from these companies to build their new GPUs and NVIDIA has sided with TSMC and using their 16FF+ process node to make the Pascal GPU a reality. The new node will deliver 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM tech. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. With FinFET, we may once again see the glory days of GPU back in action as graphics cards trounce their predecessors by a 50% performance lead and feature power efficiency that’s better in all departments.
The race to FinFET has already begun and from the beginning, we know that the process node will last us at least two GPU generations as was the case with 28nm. It is believed that NVIDIA’s Volta GPU will also be using a much more refined version of FinFET process but for now, we have eyes and ears locked at everything related to Pascal.
NVIDIA GP100 – The Flagship GPU, Powering Titans, Teslas and Quadros
The heart of next generation supercomputers and High-Performance Computing platforms is without a doubt, the Pascal GP100 graphics chip. The NVIDIA GP100 chip will be the flagship GPU of the lineup and one which will determine the performance and efficiency of the new architecture. The Pascal GP100 has long been in the rumor mill and we still don’t have conclusive details on this monolithic chip. Being the successor to the GM200, the GP100 Pascal GPU will be built on the 16nm TSMC FinFET process node and feature up to a total 17 Billion transistors inside the package.
The GPU is going to pack a lot of performance for gamers and FP64 users since this chip will be powering some serious compute-oriented machines that demand double precision compute. Being the flagship of the lineup, NVIDIA will make their GP100 GPU their first graphics chip to support HBM2 memory with up to 1 TB/s of bandwidth and 32 GB VRAM. We know Pascal has a peak double precision performance rated at over 4 TFLOPs while the single precision compute performance is rated at over 10 TFLOPs. This will be by far the biggest leap in total available compute performance we have seen on any graphics card.
As for when it arrives, there’s a strong possibility that consumers won’t get the full GP100 first nor a cut down variant. The reason is due to high demand from the HPC market as they have to update the older Kepler based cards which are since being used as FP64 options as the Maxwell chips drove FP64 support away. Powering the Tesla card first followed by GeForce and Quadro solutions, the card will be getting a range of products, without a doubt a new Titan offering and for $999 US which has been a consistent pricing for Titan graphics cards. The dual-chip cards are a totally different thing though (Titan Z). The GP100 chips will be available in a range of new packages such as the regular graphics (Add-In) boards and the new Mezzanine cards which were showed back at GTC 2015. Along with NVLINK support which is a new interconnect that NVIDIA is establishing with IBM and other partners such as CRAY, HP, DELL, TYAN, QCT and Bull, the connection would offer 80 – 200 GB/s access speeds between the several nodes integrated in HPC platforms.
NVIDIA GP100 Features:
- Based on Pascal GPU Architecture.
- Will support DirectX 12 feature level 12_1 and higher.
- Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
- Built on the 16nm FinFET manufacturing process from TSMC.
- Allegedly has a total of 17 billion transistors, more than twice that of GM200.
- Taped out in June 2015.
- Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM for the consumer variant and 32GB for the professional variant.
- Features a 4096-bit memory bus interface.
- Features NVLink and support for Mixed Precision FP16 compute tasks at twice the rate of FP32 and full FP64 support.
NVIDIA GP104 – The Gamer Focused GPU For Mainstream Graphics Cards
For NVIDIA to make sure that their Pascal architecture is a success in the gaming field, they would need to get two chips right, the GP104 and the GP106. The G**04 and G**06 chips are primarily targeted at consumers as they offer decent value and can be mass produced in larger numbers owing to their smaller dies and the total number of yields from each 16nm wafer. The G**04 chips have become the main attraction within NVIDIA’s line of graphics cards since the GeForce GTX 680 as high-end offerings. These chips which once served space to the mid-range market as the GeForce GTX 460 and the GeForce GTX 560 Ti with prices of $199 US – $249 US are now serving the gaming market in the high-end space with prices exceeding the $300 US mark.
There’s a reason why NVIDIA has increased prices on such chips, they are highly competitive. The GTX 460 was a masterpiece of a graphics card which crushed even the GTX 465 (a flopped card to begin with) and had better performance and efficiency than the HD 5850. The GTX 680 (GK104) was the first mid-range that not only beat its predecessor flagship GPU (GF110) by a 25% lead but also managed to keep performance numbers and great efficiency against the HD 7970. The chips from NVIDIA which once existed in the mid-range market were then capable enough to tackle AMD’s flagship cores. That did increase the pricing from sub-$300 to $499 US (GTX 680’s launch price). When NVIDIA launched Maxwell, they once again had nothing from the competition that matched their cards until 10 months later. This resulted in NVIDIA bagging some good sales from their second generation Maxwell cards.
NVIDIA has learned from the passing years that timing, pricing and features are three essential things for their gaming focused cards to be a success. The GP104 will be delivered in cards ranging from $300 up to $549 US prices. We don’t know what NVIDIA plans to call their next generation of cards but there’s a good reason to say that we are looking at the same performance improvement we once saw from the GTX 680 over the GTX 580. Pascal not only brings with it a new process node, but also a new architecture and a range of gaming focused features. NVIDIA has a strong influence on the PC gaming market, their recent GameWorks initiatives can be found in almost every modern AAA title and they have great driver support for their graphics cards. NVIDIA can have a great showcase of performance just with their GP104 cards.
While Maxwell had a 5-10% improvement of GTX 780 Ti, I can very easily tell that Pascal GP104 will be a greater performance increase over GM200 along with hardware that’s better built to support DX12, Vulkan API, VR/AR. Along with the added support for game technologies, Pascal GP104 chips will run with GDDR5X memory which is the new and fastest memory standard based on the GDDR5 architecture to deliver better bandwidth and fast clock speeds on VRAM chips. There’s a slight possibility that we may see special versions of the GP104 chips that come with HBM2 VRAM. Talking specifically about how many flops this chip will be able to get out of its belly, I should say between 6-7 TFLOPs sounds like a nice estimate if not an accurate one since the current GM 204 chip has 4.6 TFLOPs of performance which was up from 3 TFLOPs on the GTX 680 while the GTX 580 was around 2.0 TFLOPs in compute.
The GeForce GTX 980 was the first full high-end, discrete class graphics chip to come to mobility (MXM). The GP104 will be doing the same with TDPs expected to be close to the 150W range and providing better performance on both mobility and desktop fronts. To sum it up, the GP104 based cards will be the most critical of all products available in the graphics lineup as they will be aimed at the market that amounts to the most revenue for NVIDIA. Since GP104 is a much critical product for NVIDIA, they will be showcasing more details regarding cards based on the graphics chip at GTC 2016 which is just four months away.
NVIDIA GP106 – The Budget-Minded GPU For Sweet-Spot Graphics Cards
The NVIDIA GP106 is another important chip that NVIDIA needs to keep in mind when talking about gamers. The GP106 will be seen in action on several sweet-spot graphics cards that will retail in the sub-$250 pricing that has so far seen no competition from NVIDIA with their Maxwell generation of graphics cards. The GP106 chips will feature TDPs below 120W as the current GTX 960 already has a TDP of 120W and is based on the GM106 core architecture. NVIDIA might want to enable the card with a wider memory bus and a higher VRAM solution since current GM106 cards have been starved of bandwidth due to 128-bit memory buses even if the Color Compression technologies are on board Maxwell.
It is probable that the Drive PX 2 chip we saw from NVIDIA at CES 2016 was powered by either the GP104 or the GP106 solution since these cards under the MXM package could offer TDPs of just 100W. A performance increase of over 3 TFLOPs is a good increase over the 2.30 TFLOPs of GM206. Since GM206 was half of the specs of the GM204 core, it is highly likely that the same could be seen on the GP106 with it being half the core specifications of the bigger GP104 GPU core.
The GP106 will be a main competitor against the highly efficient Polaris GPU that AMD demonstrated back at CES 2016. The Polaris chip pitted against the GTX 950 GPU was a entry level offering which offered the same performance as the GM206 based GPU but with significantly lower power requirements. The GP106 will need to have gear up in both performance and efficiency departments. The GTX 950 is already a 90W solution so its predecessor might do away with power connector requirements and run only on PCI-e power.
NVIDIA GP107 – The Entry Level GPU Aimed At Power Efficient, Low-TDP Graphics Cards
The entry-level solutions will be powered by the GP107 and GP108 chips. Back in early 2014, NVIDIA introduced their first generation Maxwell architecture which was a significant leap in efficiency numbers. Two years later, AMD is trying to tackle NVIDIA on the same patterns which green team had mastered since Kepler back in 2012 and they have already demonstrated their new Polaris architecture which does confirm what they have been telling to the audiences so far. The GM107 was already a sub-60W chip which didn’t use any power connector and ran on PCI-e power. It’s successor might be the first sub-50W chip with aim at efficient computing. There’s a big market for these cards as they retail for sub-$150 US prices and offer performance that can drive games at 1080P resolution (moderate settings) with ease. The GTX 750 Ti is still seen as a better option compared to the GTX 950 in the APAC region, a card that can be better than the GTX 750 Ti will make all those using GM107 want to upgrade their PCs.
NVIDIA holds a dominant position in the discrete graphics market, amounting to more than 80% of the entire discrete graphics shipped around the globe. In mobility sector, NVIDIA has the most fastest solutions which so far remain unmatched by competition. In terms of gaming technologies, NVIDIA was the first to introduce lag free and tear free gaming through their G-Sync technology and the first to announce a range of new graphical features under the GameWorks program. Even with all the progress made by NVIDIA, they still consider AMD being a strong competitor offering some great graphics cards that offer better performance at decent value. The road ahead however is a fierce battle between the two long rivals as they launch their next generation 14 / 16nm FinFET based solutions with a historical leap in efficiency and performance.
NVIDIA Pascal and AMD Polaris – The FinFET GPUs:
|GPU Family||AMD Vega||AMD Navi||NVIDIA Pascal||NVIDIA Volta|
|Flagship GPU||Vega 10||Navi 10||NVIDIA GP100||NVIDIA GV100|
|GPU Process||14nm FinFET||7nm FinFET||TSMC 16nm FinFET||TSMC 12nm FinFET|
|GPU Transistors||15-18 Billion||TBC||15.3 Billion||21.1 Billion|
|GPU Cores (Max)||4096 SPs||TBC||3840 CUDA Cores||5376 CUDA Cores|
|Peak FP32 Compute||13.0 TFLOPs||TBC||12.0 TFLOPs||>15.0 TFLOPs (Full Die)|
|Peak FP16 Compute||25.0 TFLOPs||TBC||24.0 TFLOPs||120 Tensor TFLOPs|
|VRAM||16 GB HBM2||TBC||16 GB HBM2||16 GB HBM2|
|Memory (Consumer Cards)||HBM2||HBM3||GDDR5X||GDDR6|
|Memory (Dual-Chip Professional/ HPC)||HBM2||HBM3||HBM2||HBM2|
|HBM2 Bandwidth||484 GB/s (Frontier Edition)||>1 TB/s?||732 GB/s (Peak)||900 GB/s|
|Graphics Architecture||Next Compute Unit (Vega)||Next Compute Unit (Navi)||5th Gen Pascal CUDA||6th Gen Volta CUDA|
|Successor of (GPU)||Radeon RX 500 Series||Radeon RX 600 Series||GM200 (Maxwell)||GP100 (Pascal)|