NVIDIA Launches A2 Tensor Core GPU, An Entry-Level Design Powered By Ampere GA107 GPU & 16 GB GDDR6 Memory


NVIDIA has further expanded its professional data center lineup of Ampere GPUs with the A2 Tensor Core GPU accelerator. The new accelerator is the most entry-level design we have seen from NVIDIA and boasts some decent specifications based on its entry-level market designation.

NVIDIA A2 Tensor Core GPU Is An Entry-Level Data Center Design Powered By Ampere GA107

The NVIDIA A2 Tensor Core GPU is designed specifically for inferencing and replaces the Turing-powered T4 Tensor Core GPU. In terms of specifications, the card features a variant of Ampere GA107 GPU SKU which offers 1280 CUDA cores and 40 Tensor cores. These cores run at a clock frequency of 1.77 GHz and are based on the Samsung 8nm process node. Only the higher-end GA100 GPU SKUs are based on the TSMC 7nm process node.

NVIDIA GeForce RTX 2060 12 GB Graphics Card To Feature SUPER TU106 GPU & 184W TDP

Memory design comprises a 16 GB GDDR6 capacity that runs across a 128-bit bus-wide interface, clocking in at 12.5 Gbps effectively for a total bandwidth of 200 GB/s. The GPU is configured to operate at a TDP between 40 and 60 Watts. Due to its entry-level design, it also comes in a small form factor design with a Half-Height and Half-Length form factor which is passively cooled. Due to its lower TDP, it doesn't require any external power connectors to boot. The card also features a PCIe Gen 4.0 x8 interface instead of the standard x16 link.

The NVIDIA A2 Tensor Core GPU provides entry-level inference with low power, a small footprint, and high performance for NVIDIA AI at the edge. Featuring a low-profile PCIe Gen4 card and a low 40-60W configurable thermal design power (TDP) capability, the A2 brings versatile inference acceleration to any server for deployment at scale.


Performance-wise, the compute numbers are rated at 4.5 TFLOPs (FP32), 0.14 TFLOPs (FP64), 36 TOPs (INT8), 18 TFLOPs (FP16 Tensor), and 9 TFLOPs (TF32) Tensor. Comparing performance in IVA to an NVIDIA T4, the A2 offers up to 30% improvement along with consuming much lower power. The NVIDIA A2 Tensor Core GPU is available as of right now though there are no specific details shared regarding the pricing of the card.

NVIDIA Ampere Professional GPU Lineup

GPU NameA100A40A30A16A10A2
Process NodeTSMC 7nmSamsung 8nmTSMC 7nmSamsung 8nmSamsung 8nmSamsung 8nm
GPU SKUGA100-884GA102-895GA100-8904x GA107GA102-890GA107
GPU Transistors54.2B28.3B54.2BTBA28.3BTBA
CUDA Cores69121075235842560 x492161280
Tensor Cores43233622480 x428840
Boost Clock1.41 GHz1.74 GHz1.44 GHz1.69 GHz1.69 GHz1.77 GHz
FP32 Compute19.49 TFLOPs37.42 TFLOPs10.32 TFLOPs8.678 TFLOPs x431.24 TFLOPs4.5 TFLOPs
FP64 Compute9.74 TFLOPs1.16 TFLOPs5.16 TFLOPs0.27 TFLOPs x40.97 TFLOPs0.14 TFLOPs
FP16 Compte77.97 TFLOPs37.42 TFLOPs10.32 TFLOPs8.67 TFLOPs x431.24 TFLOPs4.5 TFLOPs
INT8 Tensor Compute624 TOPS598.6 TOPs330 TOPSTBA500 TOPS36 TOPS
TF32 Tensor Compute156 TFLOPS149.6 TOPs82 TFLOPSTBA125 TF9 TFLOPS
PCIe InterconnectsNVLink 3
12 Links
PCIe 4.0 x16PCIe 4.0 x16 +
NVLink 3 (4 Links)
PCIe 4.0 x16PCIe 4.0 x16PCIe 4.0 x8
Memory Capacity40 GB HBM2e48 GB GDDR624 GB HBM2e16 GB x4 GDDR624 GB GDDR616 GB GDDR6
Memory Bus5120 bit384 bit3072 bit128 bit x4384 bit128-bit
Memory Clock1215 MHz1812 MHz1215 MHz1812 MHz1563 MHz1563 MHz
Bandwidth1.55 TB/s695.8 GB/s933.1 GB/s231.9 GB/s x4600.2 GB/s200 GB/s
Form FactorSXM4PCIe Dual Slot, Full LengthPCIe Dual Slot, Full LengthPCIe Dual Slot, Full LengthPCIe Single Slot, FLHHPCIe Single Slot, HLHF