Intel AutoRound Enables Faster & More Efficient Quantized LLM Models On Intel GPUs & CUDA-Based Devices, Cresent Island With FP8, MXFP8 & MXFP4 Confirmed

•

Dec 9, 2025 at 07:00am EST

Intel's AutoRound achieves faster and efficient LLM serving across Intel CPUs and GPUs, while Crescent Island is ready with MXFP8 & MXFP4 support.

Intel AutoRound Algorithm Boosts LLM Delivery On Intel CPUs, GPUs, CUDA Platforms, Crescent Island Gets MXFP8 and MXFP4 Support

Press Release: We’re excited to announce that AutoRound, a state‑of‑the‑art post‑training quantization(PTQ) algorithm developed by Intel, is now integrated into LLM Compressor. This collaboration delivers:

Higher accuracy for low bit-width quantization
Lightweight tuning (hundreds of steps, not thousands)
Zero additional inference overhead
Seamless compatibility with compressed-tensors and direct serving in vLLM
Streamlined workflow: quantize and serve models with just a few lines of code

Broader quantization schemes and model coverage are coming next—try it now and help shape what we build.

What Is AutoRound?

AutoRound is an advanced post-training quantization (PTQ) algorithm designed for Large Language Models(LLMs) and Vision-Language Models (VLMs). It introduces three trainable parameters per quantized tensor: v (rounding offset/adjustment), α, and β (learned clipping range controls). By processing decoder layers sequentially and applying signed gradient descent, AutoRound jointly optimizes rounding and clipping to minimize block‑wise output reconstruction error.

Core strengths:

Superior accuracy, especially at very low bit‑widths
Support multiple data types: W4A16, MXFP8, MXFP4, FP8, NVFP4, with more on the way
Mixed‑bit, layer‑wise precision search for flexible accuracy–efficiency trade‑offs
Applicability across both LLMs and VLMs

AutoRound enables quantized models in a range of low‑bit formats that are designed to accelerate inference on Intel Xeon processors, Intel Gaudi AI accelerators, Intel Data Center GPUs, Intel Arc B‑Series Graphics, as well as other GPUs (e.g., CUDA-based devices).

Looking forward, Intel is adding native support for FP8, MXFP8, and MXFP4 formats to its next-generation Intel Data Center GPU codenamed Crescent Island. Models quantized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real-world deployment.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Intel AutoRound Enables Faster & More Efficient Quantized LLM Models On Intel GPUs & CUDA-Based Devices, Cresent Island With FP8, MXFP8 & MXFP4 Confirmed

Intel AutoRound Algorithm Boosts LLM Delivery On Intel CPUs, GPUs, CUDA Platforms, Crescent Island Gets MXFP8 and MXFP4 Support

Related Story The World’s Top Cloud Providers Are Now Getting NVIDIA’s Vera Rubin NVL72, The World’s Fastest AI Platform

Further Reading

AMD Confirms Different FSR 4.1 Model For RDNA 3 GPUs; Delayed Support On RDNA 2 Is Due To Significant Optimization Needs

Intel Crescent Island "Xe3P" GPU Scales To 480 GB of "Cost-Optimized" LPDDR5X Memory, Beating NVIDIA Rubin & AMD MI450X With Highest Capacity

Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage

"I Produce The Lowest Cost Tokens In The World" Says NVIDIA CEO As He Highlights The Full-Stack Approach To AI