AMD’s vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

May 11, 2026 at 02:40pm EDT
AMD's vLLM-ATOM Plugin Supercharges DeepSeek-R1, Kimi-K2, and gpt-oss-120B AI LLM Inference on Instinct MI350 and MI400 Accelerators

AMD has introduced a new plugin called vLLM-ATOM, which supercharges AI LLMs while supporting its Instinct MI350 and MI400 GPUs.

AMD Offers Big Boost To AI LLMs With Its vLLM-ATOM Plugin That Works Seamlessly With vLLM & Accelerates AI Inference Performance

The vLLM-ATOM is a purpose-built plugin that aims to improve inference performance across various AI LLMs. It is designed around AMD's high-performance Instinct GPU accelerators, such as the MI350 and MI400 series, running both as a standalone inference server or through seamless integration as a plugin backend. This allows users to take full advantage of AMD's native model and kernel optimizations without any modifications to the vLLM's core database.

Related Story AMD Says It Had To Rebuild The Ryzen 5 5800X3D To Bring It Back For AM4’s 10th Anniversary

The main highlights of vLLM-ATOM include:

The vLLM-ATOM architecture is broken down into three layers:

LayerResponsibility
vLLMRequest scheduling, KV cache management, continuous batching, OpenAI-compatible API
ATOM PluginPlatform registration, optimized model implementation, attention backends routing, kernel-level optimization tuning
AITERLow-level GPU kernels — fused MoE, flash attention, quantized GEMM, RoPE fusion

In terms of model support, the vLLM-ATOM plugin supports both AI LLMs and VLMs through a unified serving pipeline. Following is the full list:

ArchitectureTypeRepresentative ModelsATOM Model Class
Qwen3MoeForCausalLMMoEQwen/Qwen3-235B-A22B-Instruct-2507-FP8atom.models.qwen3_moe
DeepseekV3ForCausalLMMoE (MLA)deepseek-ai/DeepSeek-R1-0528 (FP8), amd/DeepSeek-R1-0528-MXFP4, amd/Kimi-K2-Thinking-MXFP4atom.models.deepseek_v2
GptOssForCausalLMMoEopenai/gpt-oss-120batom.models.gpt_oss
Glm4MoeForCausalLMMoE (MLA)zai-org/GLM-4.7-FP8atom.models.glm4_moe
Qwen3NextForCausalLMHybrid MoEQwen/Qwen3-Next-80B-A3B-Instruct-FP8atom.models.qwen3_next
Qwen3_5ForConditionalGenerationDense (Text/VLM)Qwen/Qwen3.5-35B-A3B-FP8atom.models.qwen3_5
Qwen3_5MoeForConditionalGenerationMoE (Text/VLM)Qwen/Qwen3.5-397B-A17B-FP8atom.models.qwen3_5
KimiK25ForConditionalGenerationMoE (Text/VLM)amd/Kimi-K2.5-MXFP4atom.models.kimi_k25

AMD's Note: vLLM-ATOM proves that hardware-specific optimization and framework compatibility are not mutually exclusive. By leveraging vLLM’s out-of-the-box plugin mechanism, ATOM delivers AMD-native kernel optimizations—including fused attention, quantized GEMM, and optimized MoE routing—while preserving the full feature set of vLLM that production LLM deployments rely on.

Beyond immediate performance gains, the plugin’s architecture serves as a critical proving ground for AMD’s hardware and software innovations: optimizations validated in ATOM’s plugin mode are gradually upstreamed to vLLM’s native ROCm backend, benefiting the entire ROCm and open-source LLM community. For end users, this means immediate access to the latest AMD hardware capabilities without waiting for slow upstream integration cycles—creating a virtuous cycle of co-evolution between AMD hardware innovation and the vLLM serving ecosystem.

  1. ATOM Documentation
  2. vLLM-ATOM Guide
  3. RFC: Enable ATOM as vLLM out-of-tree Platform
  4. ATOM Repository
  5. AITER - AMD Inference Tensor Engine for ROCm
  6. vLLM-ATOM Recipes
  7. Docker Hub - ATOM + vLLM Images

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.