Hardware Report

AMD-Powered Frontier Supercomputer Uses 3K of Its 37K MI250X GPUs To Achieves a Whopping 1 Trilllion Parameter LLM Run, Comparable To ChatGPT-4

Muhammad Zuhair • Jan 7, 2024 at 06:00am EST

USA Plans To Produce The Fastest Supercomputer Called Discovery, Surpassing The Frontier By 3-5 Times 1

The AMD-powered Frontier Supercomputer with Instinct MI250X GPUs has achieved a 1 Trillion Parameter LLM run, rivaling ChatGPT-4.

The Frontier Supercomputer Sets New Records In The Space of LLM Training, Courtesy of AMD's EPYC CPUs & Instinct GPUs

The Frontier supercomputer is the world's leading supercomputer and the only Exascale machine that is currently operating. This machine is powered by AMD's EPYC & Instinct hardware which not only offers the top HPC performance but is also the 2nd most efficient supercomputer on the planet. A submission report on Arxiv by individuals has revealed that the Frontier supercomputer has reached the ability to train one trillion parameters through "hyperparameter tuning", setting a new industry benchmark.

This is 'just' 3k of the 37k MI250X on Frontier. To steal from Feynman, "There is plenty of room at the top!" -- I'm expecting lots of interesting work scaling out further to tens of thousands of nodes.

— Nicholas Malaya (@nicholasmalaya) January 5, 2024

Before we go into the crux, let's take a quick recap on what the Frontier supercomputer holds. The supercomputer by ORNL has been designed from the ground up with AMD's 3rd Gen EPYC Trento CPUs and Instinct MI250X GPU accelerators. It is installed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, where it is operated by the Department of Energy (DOE). It currently has achieved 1.194 Exaflop/s using 8,699,904 cores. The HPE Cray EX architecture combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI, with AMD Instinct 250X accelerators and a Slingshot-11 interconnect. Frontier has been able to maintain the number one spot on the Top500.org list of supercomputers, showing its dominance.

The new records achieved by Frontier are a result of implementing effective strategies to train LLMs and use the onboard hardware most efficiently. The team has been able to achieve notable results through their extensive testing of 22 Billion, 175 Billion, and 1 Trillion parameters, and the figures obtained are a result of optimizing and fine-tuning the model training process. The results were achieved by employing up to 3,000 AMD's MI250X AI accelerators, which have shown their prowess despite being a relatively outdated piece of hardware.

What's more interesting is that the whole Frontier supercomputer houses 37,000 MI250X GPUs so one can imagine the kind of performance when using the entire GPU pool to power LLMs. AMD is also on the verge of implementing its MI300 GPU accelerators in brand-new supercomputers with a robust ROCm 6.0 ecosystem that further accelerates AI performance.

For 22 Billion, 175 Billion, and 1 Trillion parameters, we achieved GPU throughputs of 38.38%, 36.14%, and 31.96%, respectively. For the training of the 175 Billion parameter model and the 1 Trillion parameter model, we achieved 100% weak scaling efficiency on 1024 and 3072 MI250X GPUs, respectively. We also achieved strong scaling efficiencies of 89% and 87% for these two models.

- Arvix

The future holds plenty for the server and the data center segment, and it is important to note that Frontier currently employs hardware that isn't relatively new in the industry. With continuous advancements within the generative AI segment, it is evident that markets would need more computing power moving ahead, which is why the advancements within hardware designed for this segment are vital for next-gen progression.

News Source: Arvix

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on AMD-Powered Frontier Supercomputer Uses 3K of Its 37K MI250X GPUs To Achieves a Whopping 1 Trilllion Parameter LLM Run, Comparable To ChatGPT-4

AMD-Powered Frontier Supercomputer Uses 3K of Its 37K MI250X GPUs To Achieves a Whopping 1 Trilllion Parameter LLM Run, Comparable To ChatGPT-4

The Frontier Supercomputer Sets New Records In The Space of LLM Training, Courtesy of AMD's EPYC CPUs & Instinct GPUs

Trending Stories

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

GameStop May Have Leaked Zelda: Ocarina of Time Remake Pre-Orders for August 4, Hinting First Real Footage Isn’t Far

Square Enix’s Final Fantasy VII Rebirth Looks Like a Remaster on PC, as Shader Injector 2.0 Delivers Series’ Best Visuals

Bethesda Doubles Down on Fallout and Elder Scrolls, Confirming Fallout 5 Preproduction and Fallout 3/New Vegas Remasters

AMD Ryzen 7 7700X3D Is Now Available At Newegg At $279; Retailer Bundles Various Hardware With The New Zen 4 CPU

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA’s GeForce RTX 5070 Ti SUPER – Specs, Performance, And Price, Everything We Know So Far

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

AMD-Powered Frontier Supercomputer Uses 3K of Its 37K MI250X GPUs To Achieves a Whopping 1 Trilllion Parameter LLM Run, Comparable To ChatGPT-4

The Frontier Supercomputer Sets New Records In The Space of LLM Training, Courtesy of AMD's EPYC CPUs & Instinct GPUs

Related Story AMD Lands Major U.S. Government AI Deal to Power Next-Gen Supercomputers, Featuring Instinct MI355X & the Newer MI430 AI Chips

Further Reading

Trending Stories

Popular Discussions