NVIDIA Announces Tesla T4 Based on Turing GPU For Inferencing – 65 TFLOPs FP16, 130 TOPs INT8, 260 TOPs INT4 at Just 75W

NVIDIA has just announced their latest Turing based Tesla T4 graphics card inference acceleration. The graphics card was announced by NVIDIA's CEO, Jensen Huang, at the GTC 2018 Japan keynote as the first Tesla based graphics card featuring the brand new Turing GPU.

NVIDIA Tesla T4 With Turing GPU Announced at GTC Japan - Aiming At The Inferencing Market With Multi-TFLOPs of Performance at Just 75W, 2560 Cores

The Turing based NVIDIA Tesla T4 graphics card is aimed at inference acceleration markets. It is designed to accelerate deep learning performance by a magnitude over its predecessors and is also going to deliver breakthrough performance for AI video applications. NVIDIA's own estimate put the graphics card at twice as fast in video processing, enabling users to decode up to 38 full-HD video streams which just wasn't possible on the previous generation.

Related StoryHassan Mujtaba
Micron’s Blazingly Fast 24 Gbps GDDR6X Memory Enters Mass Production, Coming To A Next-Gen NVIDIA GPUs Soon!

The NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator. Powered by NVIDIA Turing Tensor Cores, T4 brings revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. Packaged in an energy-efficient 75-watt, small PCIe form factor, T4 is optimized for scale-out servers and is purpose-built to deliver state-of-the-art inference in real time.

As the volume of online videos continues to grow exponentially, demand for solutions to efficiently search and gain insights from video continues to grow as well. Tesla T4 delivers breakthrough performance for AI video applications, with dedicated hardware transcoding engines that bring twice the decoding performance of prior-generation GPUs. T4 can decode up to 38 full-HD video streams, making it easy to integrate scalable deep learning into video pipelines to deliver innovative, smart video services.


The specifications inside the Tesla T4 are very impressive given its single-slot PCIe form factor. The graphics card packs the Turing TU104 GPU with 2560 CUDA cores and 320 Tensor Cores. It delivers 8.1 TFLOPs of FP32 performance, 65 TFLOPs of FP16 mixed-precision, 130 TOPs of INT8 and 260 TOPs of INT4 performance. All of this compute performance is achieved with a TDP of just 75W. It means that you don't need any external power source as the graphics card will be pulling the juice from the PCIe slot and can be put inside a 1U, 4U or any rack since the small form factor design will allow for large-scale compatibility in many servers.

Additionally, the graphics card will be coupled with 16 GB of GDDR6 memory which will deliver a bandwidth of more than 320 GB/s which is just stunning. The NV TensorRT Hyperscale Platform includes a comprehensive set of hardware and software offerings optimized for powerful, highly efficient inference. Key elements include:

  • NVIDIA Tesla T4 GPU – Featuring 320 Turing Tensor Cores and 2,560 CUDA cores, this new GPU provides breakthrough performance with flexible, multi-precision capabilities, from FP32 to FP16 to INT8, as well as INT4. Packaged in an energy-efficient, 75-watt, small PCIe form factor that easily fits into most servers, it offers 65 teraflops of peak performance for FP16, 130 teraflops for INT8 and 260 teraflops for INT4.
  • NVIDIA TensorRT 5 – An inference optimizer and runtime engine, NVIDIA TensorRT 5 supports Turing Tensor Cores and expands the set of neural network optimizations for multi-precision workloads.
  • NVIDIA TensorRT inference server – This containerized microservice software enables applications to use AI models in data center production. Freely available from the NVIDIA GPU Cloud container registry, it maximizes data center throughput and GPU utilization, supports all popular AI models and frameworks, and integrates with Kubernetes and Docker.

NVIDIA Tesla T4 GPU Specifications

Product NameTesla M4Tesla P4Tesla T4
GPU ArchitectureMaxwell GM206Pascal GP104Turing TU104
GPU Process28nm16nm FinFET12nm FinFET
CUDA Cores1280 CUDA2560 CUDA2560 CUDA
Clock Speed1072 MHz1063 MHz1582 MHz
FP32 Compute2.20 TFLOPs5.50 TFLOPs8.1 TFLOPs
FP16 ComputeN/A11 TFLOPs65 TFLOPs
INT8 ComputeN/A22 DLTOPs130 DLTOPs
INT4 ComputeN/AN/A260 DLTOPs
Memory Clock5.5 GHz6.0 GHz10 GHz
Memory Bus128-bit256-bit256-bit
Memory Bandwidth88.0 GB/s192.0 GB/s320 GB/s+

There's no word on pricing or availability yet but we will keep you updated as we get more info on the new Tesla T4 graphics card.

WccfTech Tv
Filter videos by