Intel AutoRound Enables Faster & More Efficient Quantized LLM Models On Intel GPUs & CUDA-Based Devices, Cresent Island With FP8, MXFP8 & MXFP4 Confirmed

Dec 9, 2025 at 07:00am EST

Intel's AutoRound achieves faster and efficient LLM serving across Intel CPUs and GPUs, while Crescent Island is ready with MXFP8 & MXFP4 support.

Intel AutoRound Algorithm Boosts LLM Delivery On Intel CPUs, GPUs, CUDA Platforms, Crescent Island Gets MXFP8 and MXFP4 Support

Press Release: We’re excited to announce that AutoRound, a state‑of‑the‑art post‑training quantization(PTQ) algorithm developed by Intel, is now integrated into LLM Compressor. This collaboration delivers:

Related Story Intel Crescent Island “Xe3P” GPU Scales To 480 GB of “Cost-Optimized” LPDDR5X Memory, Beating NVIDIA Rubin & AMD MI450X With Highest Capacity

Broader quantization schemes and model coverage are coming next—try it now and help shape what we build.

What Is AutoRound?

AutoRound is an advanced post-training quantization (PTQ) algorithm designed for Large Language Models(LLMs) and Vision-Language Models (VLMs). It introduces three trainable parameters per quantized tensor: v (rounding offset/adjustment), α, and β (learned clipping range controls). By processing decoder layers sequentially and applying signed gradient descent, AutoRound jointly optimizes rounding and clipping to minimize block‑wise output reconstruction error.

Core strengths:

AutoRound enables quantized models in a range of low‑bit formats that are designed to accelerate inference on Intel Xeon processorsIntel Gaudi AI acceleratorsIntel Data Center GPUsIntel Arc B‑Series Graphics, as well as other GPUs (e.g., CUDA-based devices).

Looking forward, Intel is adding native support for FP8MXFP8, and MXFP4 formats to its next-generation Intel Data Center GPU codenamed Crescent Island. Models quantized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real-world deployment.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.