Several reports have surfaced alleging that Nvidia’s Pascal GPUs could be in deep trouble after Pascal was notably absent from the Drive PX2. Nvidia announced that this 250W automotive compute box is powered by two Tegra chips and two Pascal GPUs. Yet the Drive PX2 board Nvidia’s CEO Jen-Hsun Huang held on stage carried two Maxwell based GTX 980 MXM modules instead. As was evident from the date inscripted on the chips, the size of the chips and the configuration of the modules.
Nvidia’s Pascal Is MIA, Could Be In Trouble Reports Allege
Many journalists found this quite the eye brow raising affair. Some of whom found this to be reminiscent of the company’s Fermi troubles in 2010. Fermi was Nvidia’s first GPU architecture to use 40nm technology, and due to various challenges the company faced during this important node transition a similar incident took place.
These issues with Pascal and the Drive PX 2 echo the Fermi “wood screw” even of 2009. Back then, Jen-Hsun held up a Fermi board that was nothing but a mock-up, proclaimed the chip was in full production, and would launch before the end of the year. In reality, NV was having major problems with GF100 and the GPU only launched in late March, 2010.
Use of mockups and prototypes in stage-craft is not something that’s new to the industry or Nvidia, but it’s use in place of the real thing has always been limited to situations where the real thing isn’t ready, in this case Pascal. This isn’t the first time that we’ve seen a mockup being used in a Pascal announcement either. Last year Jen-Husn Huang held Pascal board, which did not actually have a Pascal GPU. But rather a placeholder chip made to represent Pascal. This prototype was made to showcase what a mezzanine form-factor board would look like, so a functioning Pascal GPU wasn’t pertinent to this goal.
The same argument can be made for the Drive PX2 demonstration. And perhaps this is merely a marketing decision on Nvidia’s part to withhold showing any actual Pascal silicon until GTC. But the question remains why weren’t actual Pascal GPUs used in any demo to date. More importantly why did Nvidia try to pass old Maxwell silicon as Pascal?
This may look particularly troubling for Nvidia when on the other side of the coin we have AMD demoing functioning Polaris chips to the public and to the press, Charlie Demerjian notes. Does Nvidia have working Pascal silicon? Is the roadmap still on track? we can only know in due time. One thing’s for sure, if Pascal silicon doesn’t show up at GTC this April people may have genuine reason to worry.
Pascal Still Set To For A 2016 Launch – Featuring HBM2, 16nm, Mixed Precision And NV-Link
There are four hallmark technologies for the Pascal generation of GPUs. Namely HBM, mixed precision compute, NV-Link and the smaller, more power efficient TSMC 16nm FinFET manufacturing process. Each is very important in its own right and as such we’re going to break down everyone of these four separately.
#1 HBM : Stacked memory will debut on the green side with Pascal. HBM Gen2 more precisely, the second generation of the SK Hynix AMD co-developed high bandwidth JEDEC memory standard. The new memory will enable memory bandwidth to exceed 1 Terabyte/s which is 3X the bandwidth of the Titan X. The new memory standard will also allow for a huge increase in memory capacities, 2.7X the memory capacity of Maxwell to be precise. Which indicates that the new Pascal flagship will feature 32GB of video memory, a mind-bogglingly huge number.
We’ve already seen AMD take advantage ofHBM memory technology with its Fiji XT GPU. Which will feature 512GB/S of memory bandwidth, which is twice that of the GTX 980. AMD has also stated that it plans to use the second generation of this new memory technology in its Arctic Islands family of GPUs in 2016. So we’re likely to see both red and green rocking second generation stacked HBM next year.
HBM achieves this amazing improvement in memory bandwidth and capacity by employing a very wide through-silicon-via memory interface. Each HBM cube is connected to the GPU with a 1024bit wide memory bus. HBM modules actually operate at low frequencies compared to GDDR5 but thanks to the significantly wider memory interface they manage to be up to 9 times faster than standard GDDR5 memory modules.
We’ve already covered this revolutionary new memory technology exclusively and in-depth last year. HBM will quickly replace GDDR5 as the standard memory technology for high performance graphics solutions. It’s fair to say that HBM is the future.
#2 Mixed Precision / Half Precision / 16FP Compute
One of the more significant features that was revealed for Pascal was the addition of 16FP compute support, otherwise known as mixed precision compute or half precision compute. At this mode the accuracy of the result to any computational problem is significantly lower than the standard 32FP method, which is required for all major graphics programming interfaces in games and has been for more than a decade. This includes DirectX 12, 11, 10 and DX9 Shader model 3.0 which debuted almost a decade ago. This makes mixed precision mode unusuable for any modern gaming application.
However due to its very attractive power efficiency advantages over FP32 and FP64 it can be used in scenarios where a high degree of computational precision isn’t necessary. Which makes mixed precision computing especially useful on power limited mobile devices. Nvidia’s Maxwell GPU architecture feature in the GTX 900 series of GPUs is limited to FD32 operations, this in turn means that FP16 and FP32 operations are processed at the same rate by the GPU. However, adding the mixed precision capability in Pascal means that the architecture will now be able to process FP16 operations twice as quickly as FP32 operations. And as mentioned above this can be of great benefit in power limited, light compute scenarios.
#3 NV-Link : Pascal will also be the first Nvidia GPU to feature the company’s new NV-Link technology which Nvidia states is 5 to 12 times faster than PCIE 3.0.
NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers.
VOLTA GPU Featuring NVLINK and Stacked Memory NVLINK GPU high speed interconnect 80-200 GB/s 3D Stacked Memory 4x Higher Bandwidth (~1 TB/s) 3x Larger Capacity 4x More Energy Efficient per bit.
NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News
#4 16nm manufacturing process : Pascal will the first Nvidia GPU to be built on TSMC’s 16nm FinFET manufacturing process. The new process promises to be significantly more power efficient and significantly more dense than 28nm. Which would enable Nvidia to build significantly more complex and powerful GPUs all the while significantly improving power efficiency.
TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.