NVIDIA Accelerates AI Inferencing With Pascal Based Tesla P40 and Tesla P4 GPU Accelerators – Also Announces 10W Drive PX 2 Board

Hassan Mujtaba • Sep 13, 2016 at 07:36am EDT

NVIDIA has announced their latest Pascal based Tesla P40 and Tesla P4 GPU accelerators. The new cards are designed to accelerator AI / Neural Network inferencing with a boost up to 45x over the CPUs and around 4x increase over past generation GPUs. The GPU accelerators are backed up with powerful software tools that deliver a massive increase in overall efficiency.

NVIDIA Tesla P40 and Tesla P4 Announced - Accelerating AI / Deep Neural Network Inferences

NVIDIA has created a platform for deep learning with their latest Tesla cards. The platform is segmented into Training and Infrerencing GPUs. For AI Training, NVIDIA offers the Tesla P100 solution with the fastest compute performance available to date, both FP16 and FP64. This along with DIGITS Training system and Deep learning frameworks adds in higher efficiency and performance. On the other hand, we have interfacing cards and this line is powered by the Tesla P40 and Tesla P4 accelerators.

The Tesla P4 and P40 are specifically designed for inferencing, which uses trained deep neural networks to recognize speech, images or text in response to queries from users and devices. Based on the Pascal architecture, these GPUs feature specialized inference instructions based on 8-bit (INT8) operations, delivering 45x faster response than CPUs¹ and a 4x improvement over GPU solutions launched less than a year ago. via NVIDIA

Replacing the Tesla M40 and Tesla M4, the Pascal based accelerators come with DeepStream SDK and TensorRT support. The two interfacing cards are based on the GP102 and GP104 architecture, both of which are available on NVIDIA's consumer platforms in the form of GeForce and Quadro. Let's take a look at the specifications for these cards:

NVIDIA Tesla P40 "Pascal GP102" Specifications:

The Tesla P40 is the faster part of the two, featuring a full fledged GP102 GPU core. The card consists of 3840 CUDA cores and 24 GB of GDDR5 memory. Clock speeds are maintained at 1303 MHz base and 1531 MHz for boost. The memory is clocked at 7.2 GHz effective which delivers 346 GB/s bandwidth along a 384-bit interface. The chip packs 12 TFLOPs of FP32 and 47 TFLOPs of INT8 compute performance on a 250W TDP package. Like the Tesla M40 before it, the P40 also comes in passive form factor.

NVIDIA Tesla P4 "Pascal GP104" Specifications:

The Tesla P4 on the other hand features the GP104 core. It has the full 2560 CUDA cores attached to it but run at a much lower clock speed of 810 MHz base and 1063 MHz boost. This has to do with the low form factor design which the card is offered in, as it is designed for blade servers. The P4 also comes in a 50-75W package which is much lower than the GTX 1080's 190W TDP. The GTX 1080 does feature the same core count but has higher clock speeds reaching up to 2 GHz. This product is clocked at half the rate of the 1080 hence the higher power efficiency.

Rest of the specifications include a 8 GB video ram. Clock speeds for memory is retained at 6 GHz that offers 192 GB/s bandwidth along a 256-bit bus. The compute performance for this card is rated at 5.5 TFLOPs (FP32) and 22 DLTOPs (INT8). No price has been announced for the Tesla P40 or Tesla P4 but they are expected to hit the market through OEM channels in late Q4 (October-Novemeber) 2016.

NVIDIA Tesla P40 and Tesla P4 Specifications:

Product Name	Tesla M4	Tesla M40	Tesla P4	Tesla P40
GPU Architecture	Maxwell GM206	Maxwell GM200	Pascal GP104	Pascal GP102
GPU Process	28nm	28nm	16nm FinFET	16nm FinFET
CUDA Cores	1280 CUDA	3072 CUDA	2560 CUDA	3840 CUDA
Clock Speed	1072 MHz	1114 MHz	1063 MHz	1531 MHz
FP32 Compute	2.20 TFLOPs	7.00 TFLOPs	5.50 TFLOPs	12.0 TFLOPs
INT8 Compute	N/A	N/A	22 DLTOPs	47 DLTOPs
VRAM	4 GB GDDR5	24 GB GDDR5	8 GB GDDR5	24 GB GDDR5
Memory Clock	5.5 GHz	6.0 GHz	6.0 GHz	7.2 GHz
Memory Bus	128-bit	384-bit	256-bit	384-bit
Memory Bandwidth	88.0 GB/s	288.0 GB/s	192.0 GB/s	346 GB/s
TDP	~75W	250W	~75W	250W
Launch	2015	2015	2016	2016

Software Tools for Faster Inferencing

Complementing the Tesla P4 and P40 are two software innovations to accelerate AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.

TensorRT is a library created for optimizing deep learning models for production deployment that delivers instant responsiveness for the most complex networks. It maximizes throughput and efficiency of deep learning applications by taking trained neural nets — defined with 32-bit or 16-bit operations — and optimizing them for reduced precision INT8 operations.

NVIDIA DeepStream SDK taps into the power of a Pascal server to simultaneously decode and analyze up to 93 HD video streams in real time compared with seven streams with dual CPUs. This addresses one of the grand challenges of AI: understanding video content at-scale for applications such as self-driving cars, interactive robots, filtering and ad placement. Integrating deep learning into video applications allows companies to offer smart, innovative video services that were previously impossible to deliver.

NVIDIA Offers 10W, Palm-Sized Energy-Efficient AI Computer for Self-Driving Cars

NVIDIA also announced a new Drive PX 2 board for self driving cars. While the original design uses two Parker SOCs, the new model is a single chip based design. With a TDP of just 10W and a much smaller board footprint, the AI supercomputer adds more affordability to the product.

"Baidu and NVIDIA are leveraging our AI skills together to create a cloud-to-car system for self-driving," said Liu Jun, vice president of Baidu. "The new, small form-factor DRIVE PX 2 will be used in Baidu's HD map-based self-driving solution for car manufacturers." via NVIDIA

The new single-processor DRIVE PX 2 will be available to production partners in the fourth quarter of 2016. DriveWorks software and the DRIVE PX 2 configuration with two SoCs and two discrete GPUs are available today for developers working on autonomous vehicles.

NVIDIA Drive PX 2 Single Chip Board:

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Accelerates AI Inferencing With Pascal Based Tesla P40 and Tesla P4 GPU Accelerators – Also Announces 10W Drive PX 2 Board

NVIDIA Accelerates AI Inferencing With Pascal Based Tesla P40 and Tesla P4 GPU Accelerators – Also Announces 10W Drive PX 2 Board

NVIDIA Tesla P40 and Tesla P4 Announced - Accelerating AI / Deep Neural Network Inferences

NVIDIA Tesla P40 "Pascal GP102" Specifications:

NVIDIA Tesla P4 "Pascal GP104" Specifications:

NVIDIA Tesla P40 and Tesla P4 Specifications:

Software Tools for Faster Inferencing

NVIDIA Offers 10W, Palm-Sized Energy-Efficient AI Computer for Self-Driving Cars

NVIDIA Drive PX 2 Single Chip Board:

Trending Stories

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

Cygames Revives Project Awakening a Decade After Reveal, Ditching Its Own Engine for Unreal Engine 5

Trump Mobile Wants To Entice You To Buy The “Yellow Plastic” T1 Phone By Offering A Free Charging Brick

Intel EMIB-T Breaks Past Existing AI & HPC Scaling Limits, Enabling Ultra-Large Die Complexes With Over 10x Reticle Dies & 12 Gb/s+ HBM4e DRAM

Xbox Layoffs Reduce id Tech Engine Team to 1 Developer, As Unreal Engine Dominance Is Set To Grip The Industry

Popular Discussions

AMD Prepares For Zen 6 EPYC CPUs Launch For July 22nd-23rd, Confirms AMD’s Mark Papermaster

Intel’s Shot At Fabricating Apple’s A20 Chip For The Base iPhone 18 Collapses As A Credible Leaker Calls The Original Source A ‘Blowhard’

AMD’s Next-Gen Medusa Point “10-Core” CPU Beats Strix “10-Core” By 29% In Single-Core & 22% In Multi-Core While Running At Just 2.0 GHz

NVIDIA’s RTX 3060 12 GB Graphics Card Comeback Proves Just How Bad Things Are For The PC Gaming Market

AMD Ryzen Becomes The Top CPU Choice While Radeon Powers 1 In Every 3 Desktop Gaming GPUs Sold at Microcenter

NVIDIA Accelerates AI Inferencing With Pascal Based Tesla P40 and Tesla P4 GPU Accelerators – Also Announces 10W Drive PX 2 Board

NVIDIA Tesla P40 and Tesla P4 Announced - Accelerating AI / Deep Neural Network Inferences

NVIDIA Tesla P40 "Pascal GP102" Specifications:

NVIDIA Tesla P4 "Pascal GP104" Specifications:

NVIDIA Tesla P40 and Tesla P4 Specifications:

Software Tools for Faster Inferencing

NVIDIA Offers 10W, Palm-Sized Energy-Efficient AI Computer for Self-Driving Cars

NVIDIA Drive PX 2 Single Chip Board:

Further Reading

Trending Stories

Popular Discussions