AMD and Intel arm x86 against the AI gap with ACE, baking matrix-multiply engines & low-precision formats straight into future CPUs

Hassan Mujtaba
An Intel and AMD branded processor displays 'x86' on its surface, surrounded by a circuit board.

ACE, the upcoming set of x86 Extensions defined by both AMD & Intel, has seen the latest spec release, focusing on AI acceleration.

AMD & Intel Focus on AI Acceleration Through Next-Gen x86 Architectures That Are ACE Compliant

Last year, Intel and AMD partnered to strengthen the x86 ecosystem through their "x86 Ecosystem Advisory Group" initiative. The plan was to offer a standardized set of features across architectures to make x86 accessible, scalable, and compatible with future requirements. Four key features were announced: FRED, AVX10, ChkTag, and ACE.

Related Story AMD Quietly Locks TSME Encryption Behind PRO Ryzen Chips, While Consumers Discover It Only By Accident

Now, the latest ACE "AI Compute Extensions" specifications have been published by AMD and Intel, which give us an insight into what this new feature for x86 chips has to offer.

AI Computex Extensions (or ACE) for x86 architectures aim to offer a significant increase in matrix multiply performance, while offering scalability and energy efficiency. As we know, Matrix Multiplication is the core block of neural networks and LLMs in AI workloads.

Current SIMD (Single Instruction, Multiple Data) extensions, such as AVX10, can do matrix multiplication, but their scalability and compute density can be limited. Techniques such as Accelerated Matrix Multiplication can lead to higher performance, but this is not an efficient approach. The EAG aims to solve this through ACE with accelerates matrix multiplication while offering greater flexibility and scalability.

The ACE extensions define matrix multiplication primitives that augment AVX and scalar code with new capabilities, adding:

  • ACE register state, including tile and block scale registers
  • Data processing operations that consume AVX register input and operate on tile register state
  • Data move operations to move data between ACE register state and AVX registers
  • State and operations for system management

ACE provides tight integration between AVX vectors and ACE tile registers, combining high compute density tile processing operations with the comprehensive data processing features of AVX.

In addition to matrix acceleration, a number of dedicated format convert operations are provided under the AVX10 framework.

via x86 EAG

These latest specifications define x86 extensions for accelerating computation tasks, initially focusing on matrix multiplication kernels and reduced precision data formats important to ML workloads.

Data Formats

The extensions described in this document include support for several data formats. This may include native format support for operations such as matrix multiplication, scaling support for OCP MX-style operations, accumulation format, and format conversion support between different formats. Support for additional data formats may be introduced in the future.

FormatDescriptionNotes
INT88-bit integer
INT3232-bit integer
FP32SE8M23As defined by IEEE-754
BF16SE8M7
FP16SE5M10
E8M08-bit unsigned exponentUsed for power-of-two block scale formats
FP88-bit floating pointDefined in OCP 8-bit Floating Point Specification (OFP8) [1]. Also refer to OCP Microscaling Formats (MX) Specification [2].
MX FP88-bit floating point formats (SE5M2, SE4M3)
MX FP66-bit floating point formats (SE3M2, SE2M3)
MX FP44-bit floating point format (SE2M1)
MX INT88-bit fixed-point fractional format

ACE is just one step in the path forward for x86. We have also talked about APX (Advanced Performance Extensions), which will play a crucial role in the development of next-gen chips featuring x86 architectures. These advancements are expected to land in future lineups.

Hassan Mujtaba Photo

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button