Intel's Xe3 graphics architecture is official and will be coming to Panther Lake's iGPU, followed by an Xe3P variant in the future.
Intel Unveils Its 3rd Gen Xe Architecture, Xe3, Graphics, For Panther Lake's iGPU: Over 50% Performance Uplift & Getting Xe3P Upgrade Later
Last year, Intel introduced its Xe2 architecture, which was integrated into two client products, the Lunar Lake "Core Ultra 200" CPUs as an iGPU and the Arc B-Series "Battlemage" discrete graphics cards. Xe2 went on to become a much more successful launch on both platforms thanks to the learnings that Intel had with the Xe1 architecture & the Arc Alchemist A-series family.
The company has also made a lot of strides in the software department, delivering great driver support for its graphics architecture that is not only limited to gaming, but also great for content creation, rendering, and AI workloads. The recently launched Arc Pro series has also seen support within the same driver branch as the Battlemage GPUs.
So, what we can look at from the past few months is that Intel has been delivering some solid updates on the graphics front. The architecture got better, and the software is doing a better job at optimizing it & utilizing it. But a new launch is upon us with the Panther Lake "Core Ultra 300" series, and so here comes a brand new generation of Xe architecture, codenamed Xe3.
Xe3 iGPUs Are Arc B-Series iGPUs, Next-Gen Xe3P Also Teased
With Intel Xe3, Intel is building upon its Xe2 architecture by scaling the graphics to larger configurations and offering a more throughput-optimized design. There's a lot to talk about, and while we are at it, we'll also point out that Xe3 iGPUs will be branded as the Arc B-Series.
While the other Arc B-series family, the Battlemage dGPUs, are based on the Xe2 architecture, and the Panther Lake iGPUs are based on the Xe3 architecture, Intel says the decision was made because Xe2 and Xe3 are similar in some aspects, so they decided to have a single unified product stack across integrated and discrete.
That said, Intel does have a new Arc family already in the plans, and that will be using an updated Xe3 GPU architecture called Xe3P, which is said to be another significant step forward. No further details were announced, but it looks like Intel is not moving directly to Xe4; instead, they will further optimize Xe3 for future products, may those be integrated or discrete. Based on the shillioute, it looks like Xe3P could be implemented in a dGPU solution, but it can also be a higher-end iGPU configuration for Nova Lake CPUs, so we should stay tuned for that.
Also, the Xe3P GPU will not be included in the Arc B-series like the Battlemage dGPUs or the Panther Lake iGPUs, but instead, will be featured in the next Arc family, so Arc C-Series? And with that covered, let's get on with the Xe3 details.
Xe3 - Scaling Up iGPUs For More Performance & Power
Alright, so Xe3, the first thing that Intel did with the new architecture is scale up the render slices. Xe2 was configured with 4 Xe cores and 4 ray tracing units per render slice.
Xe3 takes it up to 6 Xe cores and 6 ray tracing units per render slice. That's a 50% increase in the number of cores and ray tracing units for each render slice.
This allows Intel to utilize diverse configurations of GPU tiles within its Panther Lake SoCs, which we have detailed in our deep dive here. There's a 4 Xe GPU configuration for the 8C and 16C dies, and then we've the 12 Xe GPU configuration for the top 16C die. It will be an interesting comparison as Arrow Lake and Lunar Lake, both pack up to 8 Xe cores based on the respective Xe1 and Xe2 architectures. Panther Lake is making use of 4 Xe cores on the 8C and 16C SKUs, so that's half the amount of the current lineup, but the graphics architecture improvements should retain competitiveness.
Now, let's talk about the two configurations, the first of which is the 4 Xe core die. This comes in two flavors: the 8C is fabricated on the "Intel 3" process technology, whereas the 16C is fabricated on the "TSMC N3E" process technology. The breakdown is as follows:
- 4 Xe Cores (Xe3 Architecture)
- 1 Render Slice
- 32 XMX Engines
- 4 MB L2 Cache
- 1 Geo Pipeline
- 4 Samplers
- 4 Ray Tracing Units
- 2 Pixel Backends
The 12 Xe core iGPU is fabricated on the TSMC N3E process technology. The 12 Xe core configuration is as follows:
- 12 Xe Cores (Xe3 Architecture)
- 2 Render Slices
- 96 XMX Engines
- 16 MB L2 Cache
- 2 Geo Pipelines
- 12 Samplers
- 12 Ray Tracing Units
- 4 Pixel Backends
The 4Xe iGPU configuration with 4 MB L2 cache is half the amount featured on Lunar Lake's Xe2 iGPU, which packs 8 MB L2. But the top-end 12Xe iGPU configuration gets twice the L2 cache. The doubling of cache helps traffic reduction on the SoC fabric, allowing up to 36% reduced traffic in gaming, or an average of -25%.
Now, let's talk about the architectural changes implemented within the Xe3 architecture.
The 3rd Gen Xe core features eight 512-bit Vector Engines (XVE), eight 2048-bit XMX Engines, and +33% shared L1/SLM cache.
The Xe Vector Engine now offers increased utilization on the Xe3 architecture with up to 25% more threads, variable register allocation, and FP8 dequantization support. It is composed of SIMD16 native ALUs, 3-Way Co-Issue, Extended math & FP64 blocks, and Xe matrix extensions.
The Xe3 XMX engines are responsible for AI acceleration. With up to 96 XMX engines, 12Xe iGPUs are able to deliver up to 120 TOPs. By that calculation, the 4Xe iGPUs can deliver up to 40 TOPs. The 8Xe iGPUs based on Xe2 architecture delivered up to 67 TOPS. Using the same math, an Xe3 iGPU with 8Xe cores would be able to deliver 67 TOPs of AI compute, a 25% improvement.
Following are the per Xe-core ops/clock:
- XMX TF32: 1024 ops/clk
- XMX FP16: 2048 ops/clk
- XMX BF16: 2048 ops/clk
- XMX INT8: 4096 ops/clk
- XMX INT4: 8192 ops/clk
- XMX INT2: 8192 ops/clk
Intel is also using a new enhanced ray tracing unit, which features dynamic ray management for asynchronous ray tracing. The RT unit includes several traversal pipelines, two triangle intersection units, & a BVH cache. The improvements come from the way the rays are moved through the pipeline. This is achieved by slowing down the dispatch of new rays to prevent backups in the pipeline when they move through the thread sorting unit.
The other big improvement is the new URB manager, which allows partial updates instead of fleshing out the whole thing. The URB is a structure where results are passed inside the GPU. The new architecture also features up to 2x anisotropic filtering and up to 2x stencil test rates.
And finally, on the media side, Intel has AV1 Encode/Decode, VVC Decode, and support for eDP 1.5 tech. All this combined is what enables Xe3 for Panther Lake. Some new additions include AVC 10-bit support, and Sony XAVC-H, XAVC-HS, and XAVC-S support.
Intel Continues To Scale Up & Boost GPU Performance Further With Xe3
Intel is also sharing a few early performance metrics for Xe3 GPUs, essentially microbenchmarks, which can evaluate the individual segments of the GPU microarchitecture and how much gains are noticed vs the previous year.
First up are the blend and backend performance metrics, which show little to no change because the resources dedicated to them have remained unchanged on Xe3. The FP16 metrics in GEMM see a 50% improvement, which is proportional to the scale of the GPU. Xe3 is 50% larger than Xe2, so that is where this improvement is coming from, as these micro benchmarks can fully utilize the capabilities of the architecture. Next are the microarchitectural enhancements, such as the Anisotropic rate, Mesh Render rate, Scattered Reads, and R/T intersection, which scale from 2x to 2.7x improvement.
Intel also shows some big improvements made in Xe3, such as Depth Testing and Register Heavy applications, which can go over 7x uplifts versus the prior generation.
Now coming to the actual performance metrics for Xe3 on Panther Lake vs Xe2 on Lunar Lake and Xe+ on Arrow Lake-H. Xe3 offers more than 50% performance versus Lunar Lake at peak power & >40% higher performance per watt versus Arrow Lake-H.
Following is a comparison of a frame rendered on Xe3 vs Xe2:
Then there are software optimizations that are being added to the Windows Graphics Software Stack by Intel. First of these includes compiler updates that are delivered through IGC, and Intel has now improved variable registered allocation, which is a key update.
Then there's faster scheduling with direct preemption, which means that Intel can swap between contexts without flushing, and there's also support for DirectX Cooperative Vectors. Intel also showcased a demo as part of its "Neural Radiance Field," which utilizes Cooperative Vectors.
The Intel Xe3 iGPU looks like a solid upgrade over the existing Xe2 architecture. The Xe2 architecture is currently on par with the fastest RDNA 3.5 iGPUs, such as the Radeon 890M and 880M, for mainstream laptops. While it doesn't neccessarily reaches the same performance levels of the higher-end Strix Halo with bigger RDNA 3.5 implementations, it looks like the recent Intel+NVIDIA custom SoC partnership will cover that segment.
Follow Wccftech on Google to get more of our news coverage in your feeds.
