Microsoft Azure Upgraded To AMD Instinct MI200 GPU Clusters For ‘Large-Scale’ AI training, Offers 20% Performance Improvement Over NVIDIA A100 GPUs
Yesterday, Microsoft Azure disclosed the plan to use AMD Instinct MI200 Instinct GPUs to expand AI machine learning in the widely used cloud on a larger scale. AMD revealed the MI200 GPU series at the company's exclusive Accelerated Datacenter event at the end of 2021. AMD MI200 accelerators utilize the CDNA 2 architecture, offering 58 billion transistors with high-bandwidth memory of 128 GB packed into a dual-die layout.
Microsoft Azure will use AMD Instinct MI200 GPUs to conduct extensive AI training in the cloud-based platform
Forrest Norrod, senior vice president and general manager of data center and embedded solutions at AMD, assures that the next-gen chips are close to five times more efficient than NVIDIA's top-level A100 GPU. This calculation regards FP64 measures that the company felt were "highly precise." In FP16 workloads, the gap was primarily narrowed in standard workloads, even though AMD stated the chips were up to 20 percent more instantaneous than the current NVIDIA A100, where the company remains the data center GPU leader.
Azure will be the first public cloud to deploy clusters of AMD's flagship MI200 GPUs for large-scale AI training. We've already started testing these clusters using some of our own AI workloads with great performance.
— Kevin Scott, chief technical officer, Microsoft
It is unknown when Azure instances utilizing AMD Instinct MI200 GPUs will be widely available or if the series will be used in internal workload situations.
Microsoft is reported to work with AMD to advance the company's GPUs for machine learning workloads under the open-source machine learning framework, PyTorch.
We're also deepening our investments in the open-source PyTorch framework, working with the PyTorch core team and AMD both to optimize the performance and developer experience for customers running PyTorch on Azure, and to ensure that developers' PyTorch projects work great on AMD hardware.
Microsoft's recent partnership with Meta AI was PyTorch development to help boost the infrastructure of the framework's workloads. Meta AI did reveal that the company plans to utilize next-gen machine learning workloads on a reserved cluster on Microsoft Azure that would enlist 5,400 A100 GPUs from NVIDIA.
This strategic placement by NVIDIA allowed the company to gain $3.75 billion in the last quarter, topping the company's gaming market, which ended at $3.62 billion — a first for the company.
Intel Ponte Vecchio GPUs are anticipated to launch later in the year alongside the manufacturer's Sapphire Rapids Xeon Scalable processors, which will be the first time for Intel to compete with NVIDIA's H100 and AMD's Instinct MI200 GPUs for the cloud service marketplace. The company also introduced the next-generation AI accelerators for training and inferences and reported higher performance than NVIDIA's A100 GPUs.
News Source: The Register