NVIDIA has further expanded its professional data center lineup of Ampere GPUs with the A2 Tensor Core GPU accelerator. The new accelerator is the most entry-level design we have seen from NVIDIA and boasts some decent specifications based on its entry-level market designation.
NVIDIA A2 Tensor Core GPU Is An Entry-Level Data Center Design Powered By Ampere GA107
The NVIDIA A2 Tensor Core GPU is designed specifically for inferencing and replaces the Turing-powered T4 Tensor Core GPU. In terms of specifications, the card features a variant of Ampere GA107 GPU SKU which offers 1280 CUDA cores and 40 Tensor cores. These cores run at a clock frequency of 1.77 GHz and are based on the Samsung 8nm process node. Only the higher-end GA100 GPU SKUs are based on the TSMC 7nm process node.
Memory design comprises a 16 GB GDDR6 capacity that runs across a 128-bit bus-wide interface, clocking in at 12.5 Gbps effectively for a total bandwidth of 200 GB/s. The GPU is configured to operate at a TDP between 40 and 60 Watts. Due to its entry-level design, it also comes in a small form factor design with a Half-Height and Half-Length form factor which is passively cooled. Due to its lower TDP, it doesn't require any external power connectors to boot. The card also features a PCIe Gen 4.0 x8 interface instead of the standard x16 link.
The NVIDIA A2 Tensor Core GPU provides entry-level inference with low power, a small footprint, and high performance for NVIDIA AI at the edge. Featuring a low-profile PCIe Gen4 card and a low 40-60W configurable thermal design power (TDP) capability, the A2 brings versatile inference acceleration to any server for deployment at scale.
Performance-wise, the compute numbers are rated at 4.5 TFLOPs (FP32), 0.14 TFLOPs (FP64), 36 TOPs (INT8), 18 TFLOPs (FP16 Tensor), and 9 TFLOPs (TF32) Tensor. Comparing performance in IVA to an NVIDIA T4, the A2 offers up to 30% improvement along with consuming much lower power. The NVIDIA A2 Tensor Core GPU is available as of right now though there are no specific details shared regarding the pricing of the card.
NVIDIA Ampere Professional GPU Lineup
|Process Node||TSMC 7nm||Samsung 8nm||TSMC 7nm||Samsung 8nm||Samsung 8nm||Samsung 8nm|
|GPU SKU||GA100-884||GA102-895||GA100-890||4x GA107||GA102-890||GA107|
|CUDA Cores||6912||10752||3584||2560 x4||9216||1280|
|Tensor Cores||432||336||224||80 x4||288||40|
|Boost Clock||1.41 GHz||1.74 GHz||1.44 GHz||1.69 GHz||1.69 GHz||1.77 GHz|
|FP32 Compute||19.49 TFLOPs||37.42 TFLOPs||10.32 TFLOPs||8.678 TFLOPs x4||31.24 TFLOPs||4.5 TFLOPs|
|FP64 Compute||9.74 TFLOPs||1.16 TFLOPs||5.16 TFLOPs||0.27 TFLOPs x4||0.97 TFLOPs||0.14 TFLOPs|
|FP16 Compte||77.97 TFLOPs||37.42 TFLOPs||10.32 TFLOPs||8.67 TFLOPs x4||31.24 TFLOPs||4.5 TFLOPs|
|INT8 Tensor Compute||624 TOPS||598.6 TOPs||330 TOPS||TBA||500 TOPS||36 TOPS|
|TF32 Tensor Compute||156 TFLOPS||149.6 TOPs||82 TFLOPS||TBA||125 TF||9 TFLOPS|
|PCIe Interconnects||NVLink 3|
|PCIe 4.0 x16||PCIe 4.0 x16 +|
NVLink 3 (4 Links)
|PCIe 4.0 x16||PCIe 4.0 x16||PCIe 4.0 x8|
|Memory Capacity||40 GB HBM2e||48 GB GDDR6||24 GB HBM2e||16 GB x4 GDDR6||24 GB GDDR6||16 GB GDDR6|
|Memory Bus||5120 bit||384 bit||3072 bit||128 bit x4||384 bit||128-bit|
|Memory Clock||1215 MHz||1812 MHz||1215 MHz||1812 MHz||1563 MHz||1563 MHz|
|Bandwidth||1.55 TB/s||695.8 GB/s||933.1 GB/s||231.9 GB/s x4||600.2 GB/s||200 GB/s|
|Form Factor||SXM4||PCIe Dual Slot, Full Length||PCIe Dual Slot, Full Length||PCIe Dual Slot, Full Length||PCIe Single Slot, FLHH||PCIe Single Slot, HLHF|