NVIDIA Announces Pascal GPU Powered Drive PX 2 – 16nm FinFET Based, Liquid Cooled AI Supercomputer With 8 TFLOPs Performance

Author Photo
Jan 6, 2016

NVIDIA has officially announced their latest Drive PX 2 AI supercomputer for automobiles that is powered by their 16nm FinFET based Pascal GPU. Based on the latest Pascal GPU, the Drive PX 2 is a glimpse of the power packed by Pascal GPU, NVIDIA’s next iteration of CUDA compute architecture which is going to power fast and efficient GPUs in 2016.

main_img_1003RelatedNVIDIA Unveils The Drive PX ‘Pegasus’ – First Board Capable of Level 5 Autonomy With 320 DL TOPs, Post-Volta Next-Generation dGPUs and 500W TDP

NVIDIA Surprises With Pascal GPU Powered Drive PX 2 – 16nm FinFET GPU With 8 TFLOPs of Performance

NVIDIA’s next generation graphics architecture is finally coming to the market and to our surprise, we aren’t seeing it first on graphics cards or professional products but instead, the first product to showcased Pascal is the Drive PX 2, the latest AI supercomputer that ushers in a new era of self driving cars. The demo board is said to incorporate two Tegra and  two discrete level GPUs that will be based on Pascal architecture, the one NVIDIA showcased at the event used Maxwell MXM modules instead of Pascal ones. You might imagine why this Pascal is being showcased with automobiles first? Well the reason is quite simple, automobile has indeed become a huge deal for NVIDIA as it has become a major revenue driver for them as seen in the financial analyst results posted by them for Q3 2015. Also, it makes sense to keep the server and cards related announcement core to the GTC event.

NVIDIA Pascal GPU Drive PX 2 AI Supercomputer

Our record revenue highlights NVIDIA’s position at the center of forces that are reshaping our industry,” said Jen-Hsun Huang, co-founder and chief executive officer, NVIDIA. “Virtual reality, deep learning, cloud computing and autonomous driving are developing with incredible speed, and we are playing an important role in all of them. We continue to make great headway in our strategy of creating specialized visual computing platforms targeted at important growth markets. The opportunities ahead of us have never been more promising. via NVIDIA

Just months later after NVIDIA published their financials, we are looking at how important deep learning and deep neural networks for automobiles has become for NVIDIA. The end result is Pascal, NVIDIA’s brand new GPU architecture announced on their Drive PX 2 AI supercomputer. The Drive PX2 is the successor to the last years Drive PX and instead of being powered by entirely by Tegra SOCs, it relies on 2 next-gen Tegra SOCs and two discrete Pascal GPUs. It features 12 CPU cores (probably ARM64 based) and four chips that pack Pascal GPUs, rounding up to 8 TFLOPs of performance.

nvidia-volta-3RelatedNVIDIA Volta Tesla V100 GPU Accelerator Compute Performance Revealed – Features A Monumental Increase Over Pascal Based Tesla P100

The Drive PX2 module comes with a TDP of 250W which is due to the four individual chips, two of which are ARM based and feature Pascal architecture along with two GPUs on the back that are discrete offerings. The whole module is packed inside a package that is liquid cooled (also a first for automobile supercomputers). Last of all, we know this for sure that the GPU is 16 nm FinFET based and comes a single board package with multiple modules and chips.

Now there’s a big reason to believe that even Drive PX 2 doesn’t has the full Pascal GPU and instead a cut down model that features disabled cores as the bigger version is aimed at the HPC market. We won’t see that version on consumer products for a while as 16nm FF+ is still an infant node and yields need to be pitch perfect in order for so many full Pascal GP100’s to ship to the consumer, server and professional market. The 250W is the baseline which NVIDIA chooses for their products these days and they can pack four chips with Pascal GPUs inside the Drive PX  2 module. The whole package features four chips, two Tegra chips with Pascal GPUs that have GDDR5 memory and two Tegra SOCs that come with the ARM cores and significantly cut down, GPGPU focused Pascal GPUs.

According to NVIDIA, the Pascal GPUs which will be featured on the Drive PX2 combine to give 8 TFLOPs of compute performance. It’s clear we are talking about FP32 operations here. While this increase is really good comparing to the 6.1 TFLOPs of performance on Titan X, there is still room for improvement and we are actually looking at around 10 TFLOPs of performance when these flagship GPUs hit modern graphics board and the full fat version is packed inside consumer and HPC (High-Performance Computing) platforms. Coming to those CPU cores, we are looking at 8 A57 cores and 4 custom Denver ARM cores that NVIDIA has been building for a while. The Drive PX 2 will be available in fall of 2016.

NVIDIA’s Shows off Pascal Chips in Both Tegra and Board Flavors

Update — June 5 2016 2 PM ET — : NVIDIA showed off both Tegra and MXM flavors of their Drive PX2 module. While the Drive PX 2 didn’t feature an actual Pascal GPU, NVIDIA did infer that these GPUs housed on the back of the board will be offered in MXM form factor. Being packed in a MXM type solution means that this GPU will be housed in a range of desktop and mobility solutions and confirms that NVIDIA will have both HBM2 and GDDR5X GPUs when Pascal hits the market. The GPU shown in the pictures above is presumably based on the Maxwell architecture and looks quite similar to the GeForce GTX 980M graphics chip that is also offered in the MXM package.

The chip powering the Drive PX 2’s discrete side is said to be around a 100W. Knowing this, I am inclined to believe that the two chips won’t be GM204 replacement but rather a GM206 replacement given the power they pack. We have already seen the full GM206 in action at 120W on the GeForce GTX 960, a 100W GP106 GPU core sounds like a reasonable deal in this case. The GM206 features a total compute performance of 2.30 TFLOPs on the GTX 960 GPU which features the fully enabled SKU. If the GP106 offers a good increase to total peak performance around 3.00 – 3.50 TFLOPs at a 100W, we can see the rest of the 2 TFLOPs performance being filled up by the Tegra SOCs.

Based on the Pascal/Denver combo Uarch, the chip can end up delivering up to 1 TFLOPs of FP32 compute performance, the Tegra X1 featured just 512 GFLOPs of FP32 and 1 TFLOPs of FP16 (Mixed precision) compute. Summing it up, it will be easy to get the 8 TFLOPs number that NVIDIA has advertised for their Drive PX 2 board. There also remains the possibility that this will be a heavily cut-out GP104 SKU, scaled down to meet the Drive PX 2 requirements.

The Tegra SOCs were featuring the Denver/ARM config along with a Tegra focused Pascal which is geared towards GPGPU computation.

It’s already told by NVIDIA that Drive PX 2 will be available in fall of 2015. Knowing this, NVIDIA might still have time to work off on the actual Pascal chips and use placeholder GPUs (Maxwell) to showcase what kind of tech they plan to introduce later this year. While NVIDIA still hasn’t provided as a glimpse of an actual Pascal GPU (Even Tegra was hidden behind a heatspreader), GDC/GTC 2016 sound like two possible events where NVIDIA may intro the GTX graphics lineup based on the Pascal architecture.

JHH now compares DRIVE PX 2, built on a 16nm process, to TITAN X, built on a 28nm process. DRIVE PX 2 is roughly six-times more powerful. DRIVE PX2 has 12 CPUs cores, capable of 8 teraflops of processing power and 24 teraflops of deep processing operations. It’s equivalent to 150 MacBook pros in the trunk of your car.

JHH holds DRIVE PX 2, not much bigger than a tablet. It has two next-generation Tegra processors, and two next-generation Pascal-based discrete GPUs.

NVIDIA Drive PX 2 – NVIDIA DIGITS Deep Neural Network Platform

Through Drive PX 2, NVIDIA is boosting the object detection ability of these AI supercomputers through a data set known as DriveNet. To further drive this ecosystem, NVIDIA will provide DIGITS, a deep neural network platform that offers 9 inception layers, 3 convolutional layers, 37 million neurons and can process 40B operations while offering single and multi-class object detection. NVIDIA wants to enable an end-to-end deep learning platform for self driving cars.

JHH recaps his main points from last year – noting that deep learning is what’s going to be needed to bring accuracy. But that’s going to take huge computational powers.

Several thousand engineering have gone into the NVIDIA DRIVE PX 2, the world’s first artificial=intelligence supercomputer for self-driving cards.

It’s got some chops. 12 CPUs. NVIDIA’s next-gen Pascal-based GPU. All producing 8 teraflops of power. That’s equivalent to 150 Macbook Pros. And it’s in a case the size of a school lunchbox. via NVIDIA Blogs

This will enable a car to learn about the world, convey it back to the cloud-based network, which then updates all cars. Every car company will own its own deep neural network. We want to create a platform for these to be deployed. So, to recap. Three strategies NVIDIA has:

  1. Ensure NVIDIA GPUs accelerate all frameworks for GPUs;
  2. Create platforms for deploying deep learning;
  3. Develop an end-to-end development system to train and deploy the network.

Both NVIDIA and AMD are now in the same league and on the path to offer high-performance graphics chips based on the latest FinFET nodes in 2016. AMD has already shown off their Polaris GPU architecture while the green team is back with tremendous amount of power that is inside their flagship GPU architecture, Pascal. For those who were expecting a GeForce side announcement may be a bit disappointed with the auto based news but they should also be ready as NVIDIA might be gearing up for a full-blown Pascal GeForce introduction at either GDC 2015 or their GTC ’15 conference in April 2015.

NVIDIA and AMD FinFET GPUs Comparison:

Flagship GPUVega 10Navi 10NVIDIA GP100NVIDIA GV100
GPU Process14nm FinFET7nm FinFETTSMC 16nm FinFETTSMC 12nm FinFET
GPU Transistors15-18 BillionTBC15.3 Billion21.1 Billion
GPU Cores (Max)4096 SPsTBC3840 CUDA Cores5376 CUDA Cores
Peak FP32 Compute13.0 TFLOPsTBC12.0 TFLOPs>15.0 TFLOPs (Full Die)
Peak FP16 Compute25.0 TFLOPsTBC24.0 TFLOPs120 Tensor TFLOPs
Memory (Consumer Cards)HBM2HBM3GDDR5XGDDR6
Memory (Dual-Chip Professional/ HPC)HBM2HBM3HBM2HBM2
HBM2 Bandwidth484 GB/s (Frontier Edition)>1 TB/s?732 GB/s (Peak)900 GB/s
Graphics ArchitectureNext Compute Unit (Vega)Next Compute Unit (Navi)5th Gen Pascal CUDA6th Gen Volta CUDA
Successor of (GPU)Radeon RX 500 SeriesRadeon RX 600 SeriesGM200 (Maxwell)GP100 (Pascal)