AMD’s vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

•

May 11, 2026 at 02:40pm EDT

AMD's vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

AMD has introduced a new plugin called vLLM-ATOM, which supercharges AI LLMs while supporting its Instinct MI350 and MI400 GPUs.

AMD Offers Big Boost To AI LLMs With Its vLLM-ATOM Plugin That Works Seamlessly With vLLM & Accelerates AI Inference Performance

The vLLM-ATOM is a purpose-built plugin that aims to improve inference performance across various AI LLMs. It is designed around AMD's high-performance Instinct GPU accelerators, such as the MI350 and MI400 series, running both as a standalone inference server or through seamless integration as a plugin backend. This allows users to take full advantage of AMD's native model and kernel optimizations without any modifications to the vLLM's core database.

The main highlights of vLLM-ATOM include:

Zero learning curve: Full compatibility with existing vLLM commands, APIs, and end-to-end workflows. ATOM runs transparently in the background, requiring no new tools or complex configurations—while delivering enhanced kernel performance while preserving a consistent user experience.
Instant access to AMD innovation: Leverage cutting-edge AMD hardware features (e.g., FP4 on the MI355X GPU, rack-scale inference on the MI400 GPU) and top-tier kernel optimizations (e.g., AITER fused attention, custom AllReduce) out of the box, without waiting for upstream integration into the main vLLM codebase. This drastically shortens the time-to-value for the new AMD GPUs.
Agile innovation sandbox: A fast validation layer for new technical ideas, hardware enablement, and kernel library testing (e.g., AITER). The plugin aligns flexibly with the AMD product roadmap, including new GPU releases, FP8/FP4 precision support, and next-gen attention mechanisms—unconstrained by vLLM’s upstream release cycles.
vLLM as a production-grade foundation for ROCm: As the community-standard serving framework, vLLM provides the enterprise-grade stability, broad model coverage, and production-critical features needed to deploy ROCm-based infrastructure at scale.
Mature optimizations upstreamed for all: ATOM serves as a temporary proving ground for new optimizations; once stabilized, kernels, optimization strategies, and new features are upstreamed to vLLM’s native ROCm backend, benefiting the entire ROCm software user community and strengthening the open-source LLM ecosystem.

The vLLM-ATOM architecture is broken down into three layers:

Layer	Responsibility
vLLM	Request scheduling, KV cache management, continuous batching, OpenAI-compatible API
ATOM Plugin	Platform registration, optimized model implementation, attention backends routing, kernel-level optimization tuning
AITER	Low-level GPU kernels — fused MoE, flash attention, quantized GEMM, RoPE fusion

In terms of model support, the vLLM-ATOM plugin supports both AI LLMs and VLMs through a unified serving pipeline. Following is the full list:

Architecture	Type	Representative Models	ATOM Model Class
Qwen3MoeForCausalLM	MoE	Qwen/Qwen3-235B-A22B-Instruct-2507-FP8	`atom.models.qwen3_moe`
DeepseekV3ForCausalLM	MoE (MLA)	deepseek-ai/DeepSeek-R1-0528 (FP8), amd/DeepSeek-R1-0528-MXFP4, amd/Kimi-K2-Thinking-MXFP4	`atom.models.deepseek_v2`
GptOssForCausalLM	MoE	openai/gpt-oss-120b	`atom.models.gpt_oss`
Glm4MoeForCausalLM	MoE (MLA)	zai-org/GLM-4.7-FP8	`atom.models.glm4_moe`
Qwen3NextForCausalLM	Hybrid MoE	Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	`atom.models.qwen3_next`
Qwen3_5ForConditionalGeneration	Dense (Text/VLM)	Qwen/Qwen3.5-35B-A3B-FP8	`atom.models.qwen3_5`
Qwen3_5MoeForConditionalGeneration	MoE (Text/VLM)	Qwen/Qwen3.5-397B-A17B-FP8	`atom.models.qwen3_5`
KimiK25ForConditionalGeneration	MoE (Text/VLM)	amd/Kimi-K2.5-MXFP4	`atom.models.kimi_k25`

AMD's Note: vLLM-ATOM proves that hardware-specific optimization and framework compatibility are not mutually exclusive. By leveraging vLLM’s out-of-the-box plugin mechanism, ATOM delivers AMD-native kernel optimizations—including fused attention, quantized GEMM, and optimized MoE routing—while preserving the full feature set of vLLM that production LLM deployments rely on.

Beyond immediate performance gains, the plugin’s architecture serves as a critical proving ground for AMD’s hardware and software innovations: optimizations validated in ATOM’s plugin mode are gradually upstreamed to vLLM’s native ROCm backend, benefiting the entire ROCm and open-source LLM community. For end users, this means immediate access to the latest AMD hardware capabilities without waiting for slow upstream integration cycles—creating a virtuous cycle of co-evolution between AMD hardware innovation and the vLLM serving ecosystem.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

AMD’s vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

AMD Offers Big Boost To AI LLMs With Its vLLM-ATOM Plugin That Works Seamlessly With vLLM & Accelerates AI Inference Performance

Related Story AMD Fires Back At NVIDIA’s Groq Bet, Fuses The Cerebras Wafer-Scale Engine With Helios For 5x Higher Tokens Per Second Per Watt

Further Reading

AMD Says It Now Controls Nearly Half Of The Data Center CPU Market, And Its Total Compute TAM Will Reach $2 Trillion By 2030

AMD EPYC Venice CPUs Stomp NVIDIA's Vera With 20% Faster Single-Core & 2.2x Higher Throughput With Up to 256 "Zen 6" Cores, 203 Billion Transistors & Over 5 GHz+ Clocks

NVIDIA Reportedly Increased GDDR6 And GDDR7 Kit Prices For Its RTX GPUs

Framework Previews Its AMD Ryzen AI MAX+ PRO 495 PC Desktop With 192 GB Unified Memory That Effortlessly Runs DeepSeek V4-Flash at Q8