NVIDIA Launches Two Brand New Ampere Tensor Core GPUs: A10 24 GB GDDR6 & A30 24 GB HBM2 For Datacenter
In addition to all the CPU & GPU announcements, NVIDIA is also launching today its brand new Ampere-based A10 and A30 Tensor Core GPUs. The two GPUs are aimed at data centers & are mostly geared towards virtualization platforms.
NVIDIA Ampere A10 24 GB GDDR6 & A30 24 GB HBM2 Tensor Core CPUs Launched
What's interesting about these brand new Tensor Core GPUs are their specifications. The A10 is using the GA102 GPU while the A30 is making use of the GA100 GPU. While both of these will be Ampere-based, the memory subsystem for both GPUs will be very different with the A10 offering GDDR6 and the A30 going with the standard HBM2 memory standard for data centers. So let's take a detailed look at the specifications.
NVIDIA A10 Ampere Tensor Core GPU
The NVIDIA A10 Tensor Core GPU is powered by the GA102-890 SKU. It features 72 SMs for a total of 9216 CUDA Cores. The GPU operates at a base clock of 885 MHz and boosts up to 1695 MHz. It features PCIe Gen 4.0 compliance and in terms of memory, the GPU features 24 GB GDDR6 VRAM which operates at 12.5 Gbps across a 384-bit wide bus interface. The GPU delivers a bandwidth of 600 GB/s.
As for the design of the card, it makes use of a champagne gold-colored shroud which comes in a single-slot, full-length form factor. Since this is a passively cooled card, there is no fan on it & power is provided through a single 8-pin connector with the card running off a single 8-pin connector to meet its 150W TDP demand. In terms of performance, the NVIDIA A10 Tensor Core GPU offers up to 31.2 TF FP32, 62.5 TF TF32, 125 TF BFLOAT16, 250 TOPS INT8, 500 TOPS INT4 & twice the rates with sparsity.
NVIDIA A30 Ampere Tensor Core GPU
The NVIDIA A30 Tensor Core GPU on the other hand makes use of a GA100 SKU but the exact variant is not known. It seems to be a rather cut-down variant that features a base clock of 930 MHz and a boost clock of up to 1440 MHz. The GPU is equipped with 24 GB of HBM2 VRAM that operates at a speed of 1215 MHz across a 3072-bit wide bus interface. This means that we are looking at only three active HBM2 memory stacks. The stacks deliver up to 933 GB/s of memory bandwidth.
Unlike the A10, the NVIDIA A30 Tensor Core GPU features a dual-slot and full-length design. It is powered by a single 8-pin connector too but has a higher rated TDP of 165W. In terms of performance, the NVIDIA A30 Tensor Core GPU offers up to 5.2 TF FP64, 10.3 TF Peak FP64TF, 10.3 TF FP32, 82 TF TF32, 165 TF BFLOAT16, 330 TOPS INT8, 661 TOPS INT4 & twice the rates with sparsity.
|NVIDIA Tensor Core Ampere GPUs|
|FP64 Tensor Core||–||10.3 teraFLOPS|
|FP32||31.2 teraFLOPS||10.3 teraFLOPS|
|TF32 Tensor Core||62.5 teraFLOPS | 125 teraFLOPS*||82 teraFLOPS | 165 teraFLOPS*|
|BFLOAT16 Tensor Core||125 teraFLOPS | 250 teraFLOPS*||165 teraFLOPS | 330 teraFLOPS*|
|FP16 Tensor Core||125 teraFLOPS | 250 teraFLOPS*||165 teraFLOPS | 330 teraFLOPS*|
|INT8 Tensor Core||250 TOPS | 500 TOPS*||330 TOPS | 661 TOPS*|
|INT4 Tensor Core||500 TOPS | 1,000 TOPS*||661 TOPS | 1321 TOPS*|
|RT Core||72 RT Cores||–|
2 decoder (+AV1 decode)
|1 optical flow accelerator (OFA)
1 JPEG decoder (NVJPEG)
4 video decoders (NVDEC)
|GPU memory||24GB GDDR6||24GB HBM2|
|GPU memory bandwidth||600GB/s||933GB/s|
|Interconnect||PCIe Gen4 64GB/s||PCIe Gen4: 64GB/s
Third-gen NVLINK: 200GB/s**
|Form factors||Single-slot, full-height, full-length (FHFL)||Dual-slot, full-height, full-length (FHFL)|
|Max thermal design power (TDP)||150W||165W|
|Multi-Instance GPU (MIG)||–||4 GPU instances @ 6GB each
2 GPU instances @ 12GB each
1 GPU instance @ 24GB
|vGPU software support||NVIDIA Virtual PC, NVIDIA Virtual Applications, NVIDIA RTX Virtual
Workstation, NVIDIA Virtual Compute Server
|NVIDIA AI Enterprise for VMware
NVIDIA Virtual Compute Server
Inspur’s All-New GPU Servers Supporting A30, A10, and A100
NF5468M6: ultra-flexible for AI workloads, supports 2x Intel 3rd Gen Intel Xeon Scalable processor and 8x NVIDIA A100/A40/A30 GPUs, 16x NVIDIA A10 GPUs, or 20x NVIDIA T4 GPUs; supports up to 12x 3.5-inch hard drives for large local storage in a 4U chassis; flexibly adapts to latest AI accelerators and smart NICs and has the unique function of switching topologies with one click for various AI applications including AI cloud, IVA(Intelligent Video Analysis), video processing, etc.
NF5468A5: versatile AI server featuring 2x AMD Rome/Milan CPUs and 8x NVIDIA A100/A40/A30 GPUs; N+N redundancy design enables 8x 350W AI accelerators in full-speed operations for superior reliability; the CPU-to-GPU non-blocking design allows interconnection without the PCIe switch communication, achieving faster computation efficiency.
NF5280M6: purpose-built for all scenarios, with 2x Intel 3rd Gen intel Xeon Scalable processor and 4x NVIDIA A100/A40/A30/A10 GPUs or 8x NVIDIA T4 Tensor Core GPUs in 2U chassis, capable of long-term stable operation at 45°C. The NF5280M6 is equipped with the latest PFR/SGX technology and trusted security module design, which is suitable for demanding AI applications.
Also, Inspur announced the brand-new Inspur M6 AI servers fully support NVIDIA Bluefield- 2 DPUs. Moving forward, Inspur plans to integrate NVIDIA Bluefield- 2 DPUs into its next-generation AI servers, which will enable faster and more efficient management of users and clusters as well as interconnected data access, for scenarios like AI, big data analysis, cloud computing, and virtualization.
More than 20 NVIDIA-Certified Systems are available now from worldwide computer makers. NVIDIA-Certified Systems featuring NVIDIA A30 and NVIDIA A10 GPUs will be available later this year from manufacturers.
NVIDIA AI Enterprise is available as a perpetual license at $3,595 per CPU socket. Enterprise Business Standard Support for NVIDIA AI Enterprise is $899 annually per license. Customers can apply for early access to NVIDIA AI Enterprise as they plan their upgrades to VMware vSphere 7 Update 2.