Intel's AutoRound achieves faster and efficient LLM serving across Intel CPUs and GPUs, while Crescent Island is ready with MXFP8 & MXFP4 support.
Intel AutoRound Algorithm Boosts LLM Delivery On Intel CPUs, GPUs, CUDA Platforms, Crescent Island Gets MXFP8 and MXFP4 Support
Press Release: We’re excited to announce that AutoRound, a state‑of‑the‑art post‑training quantization(PTQ) algorithm developed by Intel, is now integrated into LLM Compressor. This collaboration delivers:
- Higher accuracy for low bit-width quantization
- Lightweight tuning (hundreds of steps, not thousands)
- Zero additional inference overhead
- Seamless compatibility with compressed-tensors and direct serving in vLLM
- Streamlined workflow: quantize and serve models with just a few lines of code
Broader quantization schemes and model coverage are coming next—try it now and help shape what we build.
What Is AutoRound?
AutoRound is an advanced post-training quantization (PTQ) algorithm designed for Large Language Models(LLMs) and Vision-Language Models (VLMs). It introduces three trainable parameters per quantized tensor: v (rounding offset/adjustment), α, and β (learned clipping range controls). By processing decoder layers sequentially and applying signed gradient descent, AutoRound jointly optimizes rounding and clipping to minimize block‑wise output reconstruction error.
Core strengths:
- Superior accuracy, especially at very low bit‑widths
- Support multiple data types: W4A16, MXFP8, MXFP4, FP8, NVFP4, with more on the way
- Mixed‑bit, layer‑wise precision search for flexible accuracy–efficiency trade‑offs
- Applicability across both LLMs and VLMs
AutoRound enables quantized models in a range of low‑bit formats that are designed to accelerate inference on Intel Xeon processors, Intel Gaudi AI accelerators, Intel Data Center GPUs, Intel Arc B‑Series Graphics, as well as other GPUs (e.g., CUDA-based devices).
Looking forward, Intel is adding native support for FP8, MXFP8, and MXFP4 formats to its next-generation Intel Data Center GPU codenamed Crescent Island. Models quantized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real-world deployment.
Follow Wccftech on Google to get more of our news coverage in your feeds.
