AMD ROCm 7 Announced: MI350 Support, New Algorithms, Models & Advanced Features For AI Added, Focus on Inference With 3.5x Uplfit

Jun 12, 2025 at 01:47pm EDT
AMD ROCm 7 Announced: MI350 Support, New Algorithms, Models & Advanced Features For AI Added, Focus on Inference With 3.5x Uplfit 1

AMD goes official with its next version of open software stack technologies in the form of ROCm 7, which further accelerates AI & developer productivity.

AMD Unveils ROCm 7: The Next-Generation of Open Stack Software Innovations With Focus on AI Inferencing

With the announcement of ROCm 7, AMD is finally moving forward from its ROCm 6 software stack, which itself has seen various updates over the last few years and since the advent of AI computing. The following are some of the main features that AMD is focusing on with ROCm 7:

Related Story AMD’s Frank Azor Pushes Back on FSR 4.1 Cancellation Rumor for RDNA 3.5 iGPUs, Says No Such Decision Has Been Made

With ROCm, AMD says that it is focusing more on the growing inference capabilities within its software stack. The ROCm 7 stack will include enhanced frameworks such as vLLM v1, llm-d, SGLang, and also focuses on serving various optimizations such as Distributed Inference, Prefill, and Disaggregation. New Kernels and Algorithms coming to ROCm 7 include GEMM Autotuning, MoE, Attention, and Python-Based Kernel Authoring.

AMD has already announced FP6 and FP4 support for its MI350 series, and ROCm 7 also includes full support for these advanced datatypes such as FP8, FP6, FP4, and Mixed precision.

In terms of performance, AMD says that inference has been the largest area of focus with ROCm 7, adding up to 3.5x performance uplifts in AI workloads. Breaking down the performance uplifts, we can see up to a 3.2x increase in Llama 3.1 70B, a 3.4x increase in Qwen2-72B, and up to 3.8x in Deep Seek R1, versus ROCm 6.

In DeepSeek R1, AMD also compares its ROCm 7 stack running on an Instinct MI355X GPU against the NVIDIA Blackwell B200 platform running CUDA. ROCm 7 achieves a 30% faster throughput performance in DeepSeek R1 (FP8 Throughput) versus NVIDIA's CUDA.

As for training performance, ROCm 7 still delivers a significant uplift over ROCm 6 with a 3x uplift across Llama 2 70B, Llama 3.1 8B, and Quen 1.5 7B.

The new ROCm software stack will also be extended to Enterprise AI with complete end-to-end solutions, secure data integration, and ease of deployment. The software stack will work in coherence with GPUs, CPUs, and DPUs, and will support various workloads with a key focus on GenAI workloads.

Finally, AMD is opening ROCm support on Ryzen-based laptops and workstations later this year, along with in-box Linux and Full Windows support in the second half of this year.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.