ACE, the upcoming set of x86 Extensions defined by both AMD & Intel, has seen the latest spec release, focusing on AI acceleration.
AMD & Intel Focus on AI Acceleration Through Next-Gen x86 Architectures That Are ACE Compliant
Last year, Intel and AMD partnered to strengthen the x86 ecosystem through their "x86 Ecosystem Advisory Group" initiative. The plan was to offer a standardized set of features across architectures to make x86 accessible, scalable, and compatible with future requirements. Four key features were announced: FRED, AVX10, ChkTag, and ACE.
Now, the latest ACE "AI Compute Extensions" specifications have been published by AMD and Intel, which give us an insight into what this new feature for x86 chips has to offer.
AI Computex Extensions (or ACE) for x86 architectures aim to offer a significant increase in matrix multiply performance, while offering scalability and energy efficiency. As we know, Matrix Multiplication is the core block of neural networks and LLMs in AI workloads.
Current SIMD (Single Instruction, Multiple Data) extensions, such as AVX10, can do matrix multiplication, but their scalability and compute density can be limited. Techniques such as Accelerated Matrix Multiplication can lead to higher performance, but this is not an efficient approach. The EAG aims to solve this through ACE with accelerates matrix multiplication while offering greater flexibility and scalability.
The ACE extensions define matrix multiplication primitives that augment AVX and scalar code with new capabilities, adding:
- ACE register state, including tile and block scale registers
- Data processing operations that consume AVX register input and operate on tile register state
- Data move operations to move data between ACE register state and AVX registers
- State and operations for system management
ACE provides tight integration between AVX vectors and ACE tile registers, combining high compute density tile processing operations with the comprehensive data processing features of AVX.
In addition to matrix acceleration, a number of dedicated format convert operations are provided under the AVX10 framework.
These latest specifications define x86 extensions for accelerating computation tasks, initially focusing on matrix multiplication kernels and reduced precision data formats important to ML workloads.
Data Formats
The extensions described in this document include support for several data formats. This may include native format support for operations such as matrix multiplication, scaling support for OCP MX-style operations, accumulation format, and format conversion support between different formats. Support for additional data formats may be introduced in the future.
| Format | Description | Notes |
|---|---|---|
| INT8 | 8-bit integer | |
| INT32 | 32-bit integer | |
| FP32 | SE8M23 | As defined by IEEE-754 |
| BF16 | SE8M7 | |
| FP16 | SE5M10 | |
| E8M0 | 8-bit unsigned exponent | Used for power-of-two block scale formats |
| FP8 | 8-bit floating point | Defined in OCP 8-bit Floating Point Specification (OFP8) [1]. Also refer to OCP Microscaling Formats (MX) Specification [2]. |
| MX FP8 | 8-bit floating point formats (SE5M2, SE4M3) | |
| MX FP6 | 6-bit floating point formats (SE3M2, SE2M3) | |
| MX FP4 | 4-bit floating point format (SE2M1) | |
| MX INT8 | 8-bit fixed-point fractional format |
ACE is just one step in the path forward for x86. We have also talked about APX (Advanced Performance Extensions), which will play a crucial role in the development of next-gen chips featuring x86 architectures. These advancements are expected to land in future lineups.
Follow Wccftech on Google to get more of our news coverage in your feeds.
