AMD Radeon Instinct MI60, The First 7nm Vega 20 GPU Based 32GB HBM2 Graphics Card Detailed – 13.2 Billion Transistors on a 331mm2 Die, 7.4 TFLOPs Double Precision Compute, 1 TB/s Bandwidth

AMD has also announced their latest Radeon Instinct MI60 graphics accelerator, which is also the worlds first 7nm graphics card to have been publicly shown. The graphics accelerator is aimed at the HPC market and uses the latest 7nm Vega 20 GPU to deliver unprecedented density increase and unprecedented amounts of compute and bandwidth.

AMD Radeon Instinct MI60, Worlds First 7nm Graphics Accelerator Detailed - 64 Compute Units, 32 GB HBM2, 1 TB/s Bandwidth, and PCIe Gen 4.0 Support

There's a lot to talk about so let's start with the specifications. The AMD Radeon Instinct MI60 uses the Vega 20 GPU which is AMD's first 7nm GPU. The design of the 14nm Vega was ported over to 7nm and priority optimized for the HPC sector. This in return gave AMD a chance to fully utilize the Vega architecture, leveraging it compute capabilities and taking them a step ahead.

Related StoryHassan Mujtaba
AMD Sets Eyes on Gamescom 2022 For Ryzen 7000 “Zen 4” & AM5 Platform Announcement

The Vega 20 GPU features a total of 13.23 Billion transistors which are packed within a 331mm2 die. It's definitely a really dense design and you will note that AMD has also slightly optimized their GCN cores on Vega 20. With 7nm, AMD can optimize it run at faster clock speeds, allowing for up to 7.4 TFLOPs of double precision compute, twice that in the single precision ops of 14.8 TFLOPs and similarly, twice of that in half precisions ops, rated at 29.5 TFLOPs.

There still are 64 compute units which make up 4096 stream processors but as I mentioned before, they have been vastly optimized for the HPC market, hence delivering faster compute operations and adding DLL/ML instruction sets. Talking about Deep Learning operations, the Instinct MI60 now supports both INT8 and INT4 with a maximum theoretical compute power rated at 118 TFLOPs in INT4 and 59.0 TFLOPs in INT8.

In terms of memory, we are looking at 32 GB of HBM2 VRAM that features an unprecedented bandwidth of 1 TB/s. AMD is using four stacks of HBM2 that use an 8-Hi design and allowing for the biggest and densest VRAM capacity ever featured on a single chip GPU. In addition to the specifications, the Radeon Instinct MI60 is fully compliant with AMD's ROCM software stack, additionally making use of a new machine learning engine that will extend AMD's efforts in the Deep Learning and Artifical Intelligence space.

Related StoryHassan Mujtaba
AMD RDNA 3 GPUs For Radeon RX 7000 Graphics Cards Detailed – Navi 31 “Plum Bonito”, Navi 32 “Wheat Nas”, Navi 33 “Hotpink Bonefish”

Key features of the AMD Radeon Instinct MI60 and MI50 accelerators include:

  • Optimized Deep Learning Operations: Provides flexible mixed-precision FP16, FP32, and INT4/INT8 capabilities to meet growing demand for dynamic and ever-changing workloads, from training complex neural networks to running inference against those trained networks.
  • World’s Fastest Double Precision PCIe Accelerator: The AMD Radeon Instinct MI60 is the world’s fastest double precision PCIe 4.0 capable accelerator, delivering up to 7.4 TFLOPS peak FP64 performance allowing scientists and researchers to more efficiently process HPC applications across a range of industries including life sciences, energy, finance, automotive, aerospace, academics, government, defense and more. The AMD Radeon Instinct MI50 delivers up to 6.7 TFLOPS FP64 peak performance while providing an efficient, cost-effective solution for a variety of deep learning workloads, as well as enabling high reuse in Virtual Desktop Infrastructure (VDI), Desktop-as-a-Service (DaaS) and cloud environments.
  • Up to 6X Faster Data Transfer: Two Infinity Fabric Links per GPU deliver up to 200 GB/s of peer-to-peer bandwidth – up to 6X faster than PCIe 3.0 alone – and enable the connection of up to 4 GPUs in a hive ring configuration (2 hives in 8 GPU servers).
  • Ultra-Fast HBM2 Memory: The AMD Radeon Instinct MI60 provides 32GB of HBM2 Error-correcting code (ECC) memory, and the Radeon Instinct MI50 provides 16GB of HBM2 ECC memory. Both GPUs provide full-chip ECC and Reliability, Accessibility and Serviceability (RAS) technologies, which are critical to delivering more accurate compute results for large-scale HPC deployments.
  • Secure Virtualized Workload Support: AMD MxGPU Technology, the industry’s only hardware-based GPU virtualization solution, which is based on the industry-standard SR-IOV (Single Root I/O Virtualization) technology, makes it difficult for hackers to attack at the hardware level, helping provide security for virtualized cloud deployments.

AMD has also shared a roadmap which showcases that a new Radeon Instinct product, currently termed as "MI-Next" will be launching next year, featuring higher performance, increased connectivity, and better software compatibility. As for the Radeon Instinct MI60, it is expected to ship this quarter which indeed makes it the first 7nm graphics card to hit the market as there's no other 7nm GPU product from competition in the near horizon.

There will also be the Radeon Instinct MI50 accelerator, a slightly toned downed variant of the MI60, with 3840 cores, 16 GB HBM2 and slightly lower compute rates but aiming the machine inferencing market at a better tuned price point. Both cards would feature a TDP of 300W and power connectors wise, the MI60 would be equipped with dual 8 pin while the MI50 will use a 8+6 pin connector configuration.

AMD Radeon Instinct MI60/MI50 GPU Block Diagram and Performance Slides:


AMD Radeon Instinct Accelerators:

Accelerator NameAMD Radeon Instinct MI6AMD Radeon Instinct MI8AMD Radeon Instinct MI25AMD Radeon Instinct MI60AMD Radeon Instinct MI60
GPU ArchitecturePolaris 10Fiji XTVega 10Vega 20Vega 20
GPU Process Node14nm FinFET28nm14nm FinFET7nm FinFET7nm FinFET
GPU Cores23044096409638404096
GPU Clock Speed1237 MHz1000 MHz1500 MHz1746 MHz1800 MHz
FP16 Compute5.7 TFLOPs8.2 TFLOPs24.6 TFLOPs26.8 TFLOPs29.6 TFLOPs
FP32 Compute5.7 TFLOPs8.2 TFLOPs12.3 TFLOPs13.4 TFLOPs14.8 TFLOPs
FP64 Compute384 GFLOPs512 GFLOPs768 GFLOPs6.7 TFLOPs7.4 TFLOPs
Memory Clock1750 MHz500 MHz472 MHz500 MHz500 MHz
Memory Bus256-bit bus4096-bit bus2048-bit bus4096-bit bus4096-bit bus
Memory Bandwidth224 GB/s512 GB/s484 GB/s1 TB/s1 TB/s
Form FactorSingle Slot, Full LengthDual Slot, Half LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full Length
CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling
WccfTech Tv
Filter videos by