Intel AutoRound Enables Faster & More Efficient Quantized LLM Models On Intel GPUs & CUDA-Based Devices, Cresent Island With FP8, MXFP8 & MXFP4 Confirmed

Dec 9, 2025 at 07:00am EST

Intel's AutoRound achieves faster and efficient LLM serving across Intel CPUs and GPUs, while Crescent Island is ready with MXFP8 & MXFP4 support.

Intel AutoRound Algorithm Boosts LLM Delivery On Intel CPUs, GPUs, CUDA Platforms, Crescent Island Gets MXFP8 and MXFP4 Support

Press Release: We’re excited to announce that AutoRound, a state‑of‑the‑art post‑training quantization(PTQ) algorithm developed by Intel, is now integrated into LLM Compressor. This collaboration delivers:

Related Story The World’s Top Cloud Providers Are Now Getting NVIDIA’s Vera Rubin NVL72, The World’s Fastest AI Platform

Broader quantization schemes and model coverage are coming next—try it now and help shape what we build.

What Is AutoRound?

AutoRound is an advanced post-training quantization (PTQ) algorithm designed for Large Language Models(LLMs) and Vision-Language Models (VLMs). It introduces three trainable parameters per quantized tensor: v (rounding offset/adjustment), α, and β (learned clipping range controls). By processing decoder layers sequentially and applying signed gradient descent, AutoRound jointly optimizes rounding and clipping to minimize block‑wise output reconstruction error.

Core strengths:

AutoRound enables quantized models in a range of low‑bit formats that are designed to accelerate inference on Intel Xeon processorsIntel Gaudi AI acceleratorsIntel Data Center GPUsIntel Arc B‑Series Graphics, as well as other GPUs (e.g., CUDA-based devices).

Looking forward, Intel is adding native support for FP8MXFP8, and MXFP4 formats to its next-generation Intel Data Center GPU codenamed Crescent Island. Models quantized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real-world deployment.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.