Nvidia’s 20nm, Dual ‘Tegra X1’ Powered, Drive PX Module Revealed – Massive 2.3 TeraFlops of Compute

Usman Pirzada

Note: The TeraFlops number mentioned is too high for an SoC and for good reason. Nvidia is showcasing FP16 performance as opposed to FP32 (single precision) or FP64 (double precision).

One of the thing that Nvidia talked about today at CES was Advanced Driving Assist System or ADAS and while on that topic, they introduced their latest product: the Drive PX dedicated module. This particular beauty is a soldered package that will be provided to cars with not one but two Tegra X1 Chips, for an impressive total of upto 2.3 TeraFlops of compute power on FP16.

Nvidia Drive PXA slide from Nvidia's CES 2015 event showing the Drive PX board, inputs and outputs @Nvidia Public

Two Tegra X1s working in parallel power the Drive PX Module - oh and 20nm is here

The Tegra X1 is the successor of the Tegra K1 and it is finally here. The underlying architecture is Maxwell this time and it appears to me that this is the preliminary chip is probably the anticipated Tegra M1 given a slightly different nomenclature. Needless to say  power scaling is now much more efficient given the Maxwell architecture. While TDP doesn't really matter in an automotive environment, per chip it would be consuming about the same as the Tegra K1 considering the increased power efficiency but increased core count as well. Nvidia basically means to bring about a new and revolutionary ADAS system. ADAS

Nvidia is once again a bit vague about TDP and they claim that the entire chip (single chip) was running at 10W. So thats a total of 20W of TDP for the Drive PX module, not bad. Lets talk a bit about the architecture before I get into details of Drive PX Module. Basically, its an 8 Core design separated in, according to Nvidia, a 4 by 4 fashion. Nvidia doesn't really go into a lot of detail about this design but from the concept art these appear to be 4 high performance cores and 4 low powered cores (think 4+4 as opposed to 4+1 of the Tegra K1). The GPU side of things is divided into 2 SMM (streaming maxwell multiprocessors) for a total of 256 CUDA Cores.

Now about the actual topic of coverage: the Drive PX. Basically Drive PX will be able to utilize 12 cameras to map the environment and gain an awareness of the surroundings and situation it currently is in. It will achieve that by delving into the mysterious world of Deep Neural Nets. I have already covered DNNs in great detail in an editorial so I wonn't go into the working of the same just the relevant data pertaining to the Tegra X1 and Drive PX. The Tegra K1 is capable of recognizing 30 images per second, so the Drive PX can recognize upto 60 images per second (cue 60fps jokes). Here is the basic pipeline that is involved in the processNvidia PX pipelineNow the pipeline itself is rather impressive. The chip can handle 12 different 1080p streams simaltenously without choking. The pipeline shows your basic Image Signal Processor, the CPU, the GPU and VPE which I believe would be the vertices processing unit (nvidia doesn't specify). At the output side we have a DVR based storage, Drive/Power Train and ofcourse the Display. With this setup, Nvidia is able to employ a mini DNN right there in your car, and empower it to recognize everyday items.


Finally, Nvidia talked a bit about how you will be able to use the same board for self-driving purposes since the DNN would enable it to have a much more intuitive and intelligent understanding of its environment. This means recognizing everything from street signs to pedestrians crossing the street - all based on DNN as opposed to feature tracking.

Share this story

Deal of the Day