NVIDIA Announces Tesla T4 Based on Turing GPU For Inferencing – 65 TFLOPs FP16, 130 TOPs INT8, 260 TOPs INT4 at Just 75W

Author Photo
Sep 12

NVIDIA has just announced their latest Turing based Tesla T4 graphics card inference acceleration. The graphics card was announced by NVIDIA’s CEO, Jensen Huang, at the GTC 2018 Japan keynote as the first Tesla based graphics card featuring the brand new Turing GPU.

NVIDIA Tesla T4 With Turing GPU Announced at GTC Japan – Aiming At The Inferencing Market With Multi-TFLOPs of Performance at Just 75W, 2560 Cores

The Turing based NVIDIA Tesla T4 graphics card is aimed at inference acceleration markets. It is designed to accelerate deep learning performance by a magnitude over its predecessors and is also going to deliver breakthrough performance for AI video applications. NVIDIA’s own estimate put the graphics card at twice as fast in video processing, enabling users to decode up to 38 full-HD video streams which just wasn’t possible on the previous generation.

metro-exodus-gamescom-preview-01-headerRelated Metro Exodus Looks Amazing In New GeForce RTX Real-Time Ray Traced Global Illumination Demo

The NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator. Powered by NVIDIA Turing Tensor Cores, T4 brings revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. Packaged in an energy-efficient 75-watt, small PCIe form factor, T4 is optimized for scale-out servers and is purpose-built to deliver state-of-the-art inference in real time.

As the volume of online videos continues to grow exponentially, demand for solutions to efficiently search and gain insights from video continues to grow as well. Tesla T4 delivers breakthrough performance for AI video applications, with dedicated hardware transcoding engines that bring twice the decoding performance of prior-generation GPUs. T4 can decode up to 38 full-HD video streams, making it easy to integrate scalable deep learning into video pipelines to deliver innovative, smart video services.


The specifications inside the Tesla T4 are very impressive given its single-slot PCIe form factor. The graphics card packs the Turing TU104 GPU with 2560 CUDA cores and 320 Tensor Cores. It delivers 8.1 TFLOPs of FP32 performance, 65 TFLOPs of FP16 mixed-precision, 130 TOPs of INT8 and 260 TOPs of INT4 performance. All of this compute performance is achieved with a TDP of just 75W. It means that you don’t need any external power source as the graphics card will be pulling the juice from the PCIe slot and can be put inside a 1U, 4U or any rack since the small form factor design will allow for large-scale compatibility in many servers.

geforce-rtx-20-series-family_1534784858Related NVIDIA GeForce RTX 2080 Ti and GeForce RTX 2080 Performance in Final Fantasy XV Benchmark Leaks Out – Flagship Faster Than The Titan V at 4K

Additionally, the graphics card will be coupled with 16 GB of GDDR6 memory which will deliver a bandwidth of more than 320 GB/s which is just stunning. The NV TensorRT Hyperscale Platform includes a comprehensive set of hardware and software offerings optimized for powerful, highly efficient inference. Key elements include:

  • NVIDIA Tesla T4 GPU – Featuring 320 Turing Tensor Cores and 2,560 CUDA cores, this new GPU provides breakthrough performance with flexible, multi-precision capabilities, from FP32 to FP16 to INT8, as well as INT4. Packaged in an energy-efficient, 75-watt, small PCIe form factor that easily fits into most servers, it offers 65 teraflops of peak performance for FP16, 130 teraflops for INT8 and 260 teraflops for INT4.
  • NVIDIA TensorRT 5 – An inference optimizer and runtime engine, NVIDIA TensorRT 5 supports Turing Tensor Cores and expands the set of neural network optimizations for multi-precision workloads.
  • NVIDIA TensorRT inference server – This containerized microservice software enables applications to use AI models in data center production. Freely available from the NVIDIA GPU Cloud container registry, it maximizes data center throughput and GPU utilization, supports all popular AI models and frameworks, and integrates with Kubernetes and Docker.

NVIDIA Tesla T4 GPU Specifications

Product Name Tesla M4 Tesla P4 Tesla T4
GPU Architecture Maxwell GM206 Pascal GP104 Turing TU104
GPU Process 28nm 16nm FinFET 12nm FinFET
CUDA Cores 1280 CUDA 2560 CUDA 2560 CUDA
Clock Speed 1072 MHz 1063 MHz 1582 MHz
FP32 Compute 2.20 TFLOPs 5.50 TFLOPs 8.1 TFLOPs
FP16 Compute N/A 11 TFLOPs 65 TFLOPs
INT8 Compute N/A 22 DLTOPs 130 DLTOPs
INT4 Compute N/A N/A 260 DLTOPs
Memory Clock 5.5 GHz 6.0 GHz 10 GHz
Memory Bus 128-bit 256-bit 256-bit
Memory Bandwidth 88.0 GB/s 192.0 GB/s 320 GB/s+
TDP ~75W ~75W 75W
Launch 2015 2016 2018

There’s no word on pricing or availability yet but we will keep you updated as we get more info on the new Tesla T4 graphics card.