AMD’s vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

Hassan Mujtaba • May 11, 2026 at 02:40pm EDT

AMD has introduced a new plugin called vLLM-ATOM, which supercharges AI LLMs while supporting its Instinct MI350 and MI400 GPUs.

AMD Offers Big Boost To AI LLMs With Its vLLM-ATOM Plugin That Works Seamlessly With vLLM & Accelerates AI Inference Performance

The vLLM-ATOM is a purpose-built plugin that aims to improve inference performance across various AI LLMs. It is designed around AMD's high-performance Instinct GPU accelerators, such as the MI350 and MI400 series, running both as a standalone inference server or through seamless integration as a plugin backend. This allows users to take full advantage of AMD's native model and kernel optimizations without any modifications to the vLLM's core database.

The main highlights of vLLM-ATOM include:

Zero learning curve: Full compatibility with existing vLLM commands, APIs, and end-to-end workflows. ATOM runs transparently in the background, requiring no new tools or complex configurations—while delivering enhanced kernel performance while preserving a consistent user experience.
Instant access to AMD innovation: Leverage cutting-edge AMD hardware features (e.g., FP4 on the MI355X GPU, rack-scale inference on the MI400 GPU) and top-tier kernel optimizations (e.g., AITER fused attention, custom AllReduce) out of the box, without waiting for upstream integration into the main vLLM codebase. This drastically shortens the time-to-value for the new AMD GPUs.
Agile innovation sandbox: A fast validation layer for new technical ideas, hardware enablement, and kernel library testing (e.g., AITER). The plugin aligns flexibly with the AMD product roadmap, including new GPU releases, FP8/FP4 precision support, and next-gen attention mechanisms—unconstrained by vLLM’s upstream release cycles.
vLLM as a production-grade foundation for ROCm: As the community-standard serving framework, vLLM provides the enterprise-grade stability, broad model coverage, and production-critical features needed to deploy ROCm-based infrastructure at scale.
Mature optimizations upstreamed for all: ATOM serves as a temporary proving ground for new optimizations; once stabilized, kernels, optimization strategies, and new features are upstreamed to vLLM’s native ROCm backend, benefiting the entire ROCm software user community and strengthening the open-source LLM ecosystem.

The vLLM-ATOM architecture is broken down into three layers:

Layer	Responsibility
vLLM	Request scheduling, KV cache management, continuous batching, OpenAI-compatible API
ATOM Plugin	Platform registration, optimized model implementation, attention backends routing, kernel-level optimization tuning
AITER	Low-level GPU kernels — fused MoE, flash attention, quantized GEMM, RoPE fusion

In terms of model support, the vLLM-ATOM plugin supports both AI LLMs and VLMs through a unified serving pipeline. Following is the full list:

Architecture	Type	Representative Models	ATOM Model Class
Qwen3MoeForCausalLM	MoE	Qwen/Qwen3-235B-A22B-Instruct-2507-FP8	`atom.models.qwen3_moe`
DeepseekV3ForCausalLM	MoE (MLA)	deepseek-ai/DeepSeek-R1-0528 (FP8), amd/DeepSeek-R1-0528-MXFP4, amd/Kimi-K2-Thinking-MXFP4	`atom.models.deepseek_v2`
GptOssForCausalLM	MoE	openai/gpt-oss-120b	`atom.models.gpt_oss`
Glm4MoeForCausalLM	MoE (MLA)	zai-org/GLM-4.7-FP8	`atom.models.glm4_moe`
Qwen3NextForCausalLM	Hybrid MoE	Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	`atom.models.qwen3_next`
Qwen3_5ForConditionalGeneration	Dense (Text/VLM)	Qwen/Qwen3.5-35B-A3B-FP8	`atom.models.qwen3_5`
Qwen3_5MoeForConditionalGeneration	MoE (Text/VLM)	Qwen/Qwen3.5-397B-A17B-FP8	`atom.models.qwen3_5`
KimiK25ForConditionalGeneration	MoE (Text/VLM)	amd/Kimi-K2.5-MXFP4	`atom.models.kimi_k25`

AMD's Note: vLLM-ATOM proves that hardware-specific optimization and framework compatibility are not mutually exclusive. By leveraging vLLM’s out-of-the-box plugin mechanism, ATOM delivers AMD-native kernel optimizations—including fused attention, quantized GEMM, and optimized MoE routing—while preserving the full feature set of vLLM that production LLM deployments rely on.

Beyond immediate performance gains, the plugin’s architecture serves as a critical proving ground for AMD’s hardware and software innovations: optimizations validated in ATOM’s plugin mode are gradually upstreamed to vLLM’s native ROCm backend, benefiting the entire ROCm and open-source LLM community. For end users, this means immediate access to the latest AMD hardware capabilities without waiting for slow upstream integration cycles—creating a virtuous cycle of co-evolution between AMD hardware innovation and the vLLM serving ecosystem.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on AMD’s vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

AMD’s vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

AMD Offers Big Boost To AI LLMs With Its vLLM-ATOM Plugin That Works Seamlessly With vLLM & Accelerates AI Inference Performance

Trending Stories

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

CXMT Supply Chain To Witness Major Process Transition To Seize DDR6 Opportunity Before Commercialization, Threatening Samsung’s And SK hynix’s Global Hold

Over 80% Of Samsung Foundry Workers Are Planning To Leave Amid A Yawning Pay Gap With The Memory Division

SpaceX Awards Foxconn A Huge $52 Billion Order For 13,000 Racks Of NVIDIA GB300 AI Servers, Where Each Rack Costs $4 Million And The Total Order Spans Nearly 1 Million GPUs

An Anti-Apple Consumer Who Laughed At MacBook Prices And Lack Of Customizations Has “Hit Rock Bottom,” Saying The Windows Laptop Market Has Been A “Nightmare”

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD’s vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

AMD Offers Big Boost To AI LLMs With Its vLLM-ATOM Plugin That Works Seamlessly With vLLM & Accelerates AI Inference Performance

Related Story Snapdragon 8 Elite Gen 6 Pro Could Be A Worthy Choice For Gaming Handhelds As Qualcomm’s Flagship SoC Produces Convincing Results Over Ryzen AI Z2 Extreme

Further Reading

Trending Stories

Popular Discussions