Intel officially unveiled Panther Lake, its next-gen "Core Ultra" CPU platform with Cougar Cove P-Cores & Darkmont E-Cores.
Intel 18A Arrives With Panther Lake "Core Ultra 300" CPUs: Featuring Next-Gen Cougar Cove P-Cores & Darkmont E-Cores
Intel's disaggregated client CPU architecture journey continues with Panther Lake, its Core Ultra Series 3 family. These chips feature next-gen CPU, GPU, and NPU architectures, which are disclosed in full detail during the Tech Tour 2025 event.

Panther Lake - The Building Blocks
Panther Lake CPUs are designed to be scalable, and Intel has implemented a lot of learnings from their previous launches, such as Lunar Lake and Arrow Lake. The company promised to provide the leading x86 efficiency of Lunar Lake and the high-performance of Arrow Lake in a single package, and that is Panther Lake.

The primary element for the architecture of scalability within Panther Lake is the Scalable Fabric Gen2, which is a coherent fabric that was first introduced with Lunar Lake CPUs. It is IP-agnostic and partition-agnostic. This means that Intel can mix and match various IPs and Partitions of those IPs within the next-generation CPUs. This can be used for various physical layers, and for Panther Lake specifically, it utilizes the Foveros advanced packaging technology.
In Panther Lake, Intel has multiple layers and tiles, which include:
- Compute Tile (Intel 18A)
- Graphics Tile (Intel 3 or TSMC N3E)
- Platform Controller Tile (TSMC N6)
- Base Tile (Intel 1227.1)
- Filler Tile (N/A)
- Foveros Package
- CPU Interposer Package
First, we have the three main tiles, the Compute, the Graphics, and the Platform Controller Tile, which are assembled into one package using Intel's Foveros-S 2.5D packaging on a passive die. This modular processor architecture has been going on since Meteor Lake, and this is by far the most advanced version of such a chip that we have seen. There's also a filler tile, which maintains the integrity of the entire silicon as explained below:

Anytime you see a tile on the CPU that fills a spot, what we are really saying is you need a uniform, cavity-free surface for the heatspreader to sit on top of. If you do not mechanically support that heatspreader from below, it can bend, it can be crushed, it can be damaged, so you always want to fill all available die space and leave no cavity so filler tile, that's what it's for.
Robert Hallock (Intel VP & General Manager Client AI and Technical Marketing)
Intel Panther Lake With Next-Gen CPU Cores On 18A Node
Intel Panther Lake CPUs continue the hybrid compute architecture trend, which kicked off with Alder Lake in the mainstream segment back in 2021. Over the generations, Raptor Lake, Meteor Lake, Lunar Lake, and Arrow Lake have improved the hybrid architecture.

The three different types of compute cores and their function is described below:
- P-Core: Drives ST Performance & Throughput
- E-Core: Drives MT Performance & Parallelism
- LP-E Core: Drives Efficiency

Alder Lake was the first hybrid architecture design for mainstream platforms, but Meteor Lake was the first to start the 3-way hybrid journey with a dedicated low-power island. Meteor Lake's low-power island was insufficient, as Intel states, it had just 2 cores, low cache, and didn't meet the efficiency or MT throughput & that continued with Arrow Lake, which utilized the same Crestmont architecture for LP-E cores. Lunar Lake made a nice jump to 4 cores & a bigger cache with its low-power island based on Skymont architecture, but didn't have the MT throughput since it didn't feature any dedicated E-Cores on the compute tile. That changes with Panther Lake.

Panther Lake CPUs will include Cougar Cove P-Cores, Darkmont E-Cores, and Darkmont LP-E cores. So it's an evolution of the 3-way design and the best one that we've seen to date from Intel. The entire compute tile is based on the 18A process node, making Panther Lake the first to utilize Intel's latest in-house node.

Intel Panther Lake P-Core CPU: Cougar Cove
The next-generation P-Core architecture is codenamed Cougar Cove and builds upon Lion Cove, the P-Core architecture used on Arrow Lake and Lunar Lake CPUs. Cougar was targeted and optimized for 18A, so Intel didn't change the width or depth, but optimized the new core. So take Cougar Cove P-Core as an evolution of Lion Cove P-Core with better efficiency.

While a lot of changes were made, there are 3 key areas that Intel focused on when designing the Cougar Cove P-Core:
- Memory Disambiguation (More Reliable Performance): When a program is executed, there are loads & stores. Sometimes they are connected, but usually they aren't. Intel has enhanced its ability to predict when a load and store are connected and use that information to schedule the load correctly. And when that's done right, you get increased IPC and increased Performance.
- TLB Enhancements (1.5x capacity for modern workloads): The 18A process node gives the ability to grow some structures of the core, such as the cache, with the primary one being TLB. This allows more complex workloads to run faster and reliably.
- Branch Prediction (Improving performance and energy efficiency): With Lion Cove, Intel made some big changes to the branch prediction unit, which enabled them to have more capacity and predict quickly, so that they are able to predict the next branch even if it's far away. With Cougar Cove, the design has further evolved with changes in the underlying algorithms that are more accurate. The capacity has also been increased with a multi-level predictor, which makes it faster and also offers lower latency. Prediction accuracy and capacity are a sort of combo that leads to higher efficiency and performance.

Some high-level info about the Cougar Cove P-Core architecture:
- 8-Wide Decode
- 4-Wide MSROM
- 12-Wide uOP Cache
- 8-Wide Allocate/Rename/Move Elimination/Zero IDIOM

Diving into the Cougar Cove P-Core architecture, the Front-end on the new core features mostly the same design hierarchy as Lion Cove. The decode is retained at 8-wide while the MSROM, uOP Cache, & Allocate remain the same too, and 4-Wide, 12-Wide, and 8-Wide, respectively. The Out of Order Engine sees a split of the INT & VEC domains with their independent renaming and schedules. The engine comes with 8 wide allocation/rename units.
Intel Panther Lake E-Core CPU: Darkmont & LP-E Darkmont
The next-generation E-Core architecture is codenamed Darkmont, and like Lion Cove to Cougar Cove, it builds upon the previous E-Core architecture, Skymont. The Darkmont E-Core has the same 26 Dispatch ports, but offers higher vector throughput, more L2 bandwidth, and improvements to nanocode performance, which was first introduced in Crestmont.

There are also similar Branch Prediction updates to the E-Cores, like the ones mentioned above for Cougar Cove. So some of the main changes in the E-Core include:
- Branch Prediction (Capacity Increases & Accuracy Improvements): Algorithm tweaks for higher accuracy and new modes that can predict and shut down the front end. There's also Loop Stream detection that saves energy and offers reliable performance.
- Dynamic Prefetcher Controls (Responsiveness in workload variation): This provides a higher level of energy efficiency and a dynamic prefetch control that provides enhanced responsiveness.
- Nanocode Performance (More Instruction Coverage): Intel's E-Core is the only architecture that does nanocode. Microcode is something that x86 & other processors have done for a very long time, as the chip has to generate many UOPs when doing complex instructions. This is done through a microcode or microcode sequencer. It's a big ROM on the chip that does these complex instructions. With Nanocode, Intel is taking some of these and embedding them into hardware, into PLAs, into the front end, and that gives them the ability to decode microcode UIPs, nanocode in this case, and that can be done in each of their parallel front-end clusters. This saves latency, bandwidth, and area, leading to higher performance.
- Memory Disamiguation (More Reliable Performance): This is where both the P-Core and E-Core teams shared their findings to solve similar problems.

Starting with the details, Darkmont comes with an updated prediction block with 128 bytes, faster "Find the next" instructions, and 96 instruction bytes for Parallel Fetch. Darkmont also features a wider Decode, which includes 9-wide (3x3) or 50% more decode clusters than Crestmont E-Cores, a Nanocode that unlocks microcode parallelism per cluster, and a Uop queue capacity that's increased from 64 entries to 96 entries.
Queuing also gets more resources, with the out-of-order window now growing to 416 entries. Dispatch ports have been increased to 26, which include 8 Integer ALUs, 3 Jump Ports, and 3 Loads/Cycles.

While no IPC comparisons are made to Skymont, Darkmont does deliver a 17% uplift over Crestmont, so same as Skymont. The overall performance of Darkmont E-Cores is now faster than Raptor Cove at the same power.
Intel Panther Lake Cache & Memory Subsystem
Intel has made some big changes to the cache and memory subsystem for Panther Lake CPUs. The first change is that it brings 8 E-cores on the L3 cache ring, so the larger 18 MB L3 cache on Panther Lake chips is now accessible to both Cougar Cove P-Cores and Darkmont E-Cores.
The L2 cache for the LP-E cores is also doubled now to 4 MB, and there's an additional memory-side cache & controller inside the SoC tile.

On Lunar Lake, the Crestmont LP-E cores were on a separate tile from the compute tile, which means that they couldn't have the same latency advantages as being on the same L3 cache ring of the compute tile. & Arrow Lake, there was no memory-side cache or cache controller on the Arrow Lake CPUs within the SoC tile. There's also a dedicated power rail for the cache, which, instead of stopping at 3.5 GHz on Lunar Lake CPUs, can now go beyond 3.5 GHz, allowing more workloads to run at a higher operating point.

The Memory-Side Cache is an 8 MB cache featured on the SoC tile, a recurring feature from the Lunar Lake SoCs. This 8 MB on-die cache reduces DRAM traffic and power, leading to better latency, system bandwidth, and also provides caching for IO engines such as Media and Display.
The following is what the cache configuration for the cores looks like on Panther Lake:
- Cougar Cove P-Core (Per Core): 3 MB L2 + 256Kb L1
- Cougar Cove P-Core Sub-Cache: 192KB L1D + 48KB L0D
- Darkmont E-Core (Per Cluster): 4 MB L2 + 96 Kb L1
- Darkmont E-Core Sub-Cache: 64KB L1I + 32KB L0D

Panther Lake Scheduling, Thread Director & Power Management
Intel Panther Lake once again leverages Thread Director, which is designed to handle the multi-hybrid core architectures and schedule the right workload to the right core inside the latest Intel CPUs, starting with Alder Lake. These CPUs utilize different architectures and have different performance, IPC, and efficiency aspects, so while the OS will retain the ultimate decision in guiding with workload, it will be given to which core, but with Thread Director, it can guide from their end which core is the performant core and which is the most efficient core.

So Thread Director has two main components, the core side and the SoC side. The Core side happens on both the P-Cores and the E-Cores by using a lot of internal telemetry to classify the set of instructions being executed into four different classes:
- Class 0: Scalar Type Instructions where IPS is similar between P and E Cores
- Class 1: Slightly better IPC with P-Cores
- Class 2: AI/CPU-based AI-specific instructions that can deliver higher IPC
- Class 3: Non-Scalable Workloads
The SOC side or the P-Core side is the hardware feedback interface table, or HFI. This provides an ordered list of which cores are the most performant and which are the most efficient ones. The operating system reads this table, and in the case of major change events such as power adjustments, the power balancing can be achieved on the P-Core side. This allows OEMs to use their own scheduling policy, if they want to start with P-Cores first or E-Cores.

With Panther Lake, Intel has updated its classification models and provided optimal support for guidance to the operating system. These changes were necessary as the older classification models are no longer applicable to Panther Lake due to architecture improvements. Intel has also expanded its use case coverage based on current workload scenarios.

So with Panther Lake, Thread Director starts at LP-E cores if the work fits the use case. If it exceeds the capacity of the LP-E cores on the low-power island, the work is shifted to the E-Cores, and if that isn't enough, the work is shifted to the P-Cores.
For Lunar Lake, the work was placed on the LP-E cores before moving it to P-Cores when capacity exceeded. Meteor Lake housed the LP-E cores on the SoC tile, which is no longer the case with Panther Lake CPUs, which have the SoC tile on the same compute tile. Following is how the cores are scheduled across various workloads on Panther Lake CPUs.
Then we have gaming, where the GPU is being pushed to 100% utilization. In this scenario, the work is scheduled to the P-Cores first, right from the start, to maximize performance. And then, very quickly, the work is expanded to the E-Cores. In the example, you can see that the game is utilizing two primary threads from the P-Cores and often utilizes the E-cores for other work threads or supporting threads.
Because this was GPU utilized, one of the optimization that we have done is taking hints from our graphics driver, which is the IPC, the top that is going to talk about. And some of our power management optimizations that we have internally to come up with a improved scheduling case where we actually, when we are GPU bound, let's start from e-cores. So you can see that on compute complex, we are working in hybrid zone because there's enough concurrency or threads cannot fit it.
So it's not on efficiency zone. So we are working in compute complex. We start from e-core. We try and keep the work there as much as possible. And we only go to p-core when we exceed capacity. This is the example where, you know, using OS containment zones and some of our graphics driver hints and our power management, we are able to deliver 10% better frame rate because we are making power headroom available to graphics.
We are taking advantage of our e-cores on compute complex, the shared caches, and we are able to give extra performance in here. And this is really, I mean, I'm excited about such innovations we are able to bring to market so that we are able to deliver greater FPS and greater performance for our end user. Now let's shift gears a little bit and talk about system software and firmware power management optimization we have done in Pantherly.
via Intel
One of the optimizations that Intel has done with its Thread Director technology with Panther Lake CPUs is that they are taking hints from their graphics driver.
Intel is also introducing a new power management tool called "Intelligent Experience Optimizer," which takes some aspects of the dynamic tuning utility alongside built-in firmware optimizations, which, instead of moving the battery slider manually in the Windows OS, can adjust the power profile to performance mode if it's selected to "Balanced" mode and the system requires more performance.
This feature can provide up to 19-20% extra performance in a similar power budget and can scale dynamically.

NPU5 - More AI TOPS, More AI Formats
Panther Lake introduces an updated NPU called NPU5, which expands upon Lunar Lake's NPU4 in two key areas: area efficiency and optimizations. Intel's NPU architectures include a MAC array, which is an array of cells that perform multiplication.

In Lunar Lake, NPU4 had two MAC Array Slices in their separate Neural Compute Engines with two Shave DSPs per slice and their back-end functions.

Intel states that this is very inefficient, so with NPU5, they doubled the MAC Array throughput by including a single Neural Compute Engine & streamlining the backend functions. This enables Panther Lake with more MACs per unit area vs the previous generation.
NPU5 across all Panther Lake SoCs will feature three MAC Arrays, which are double the size of the last-gen MAC Arrays. There are 3 NCEs, 12K MACs, 4.5 MB of scratchpad RAM, 6 SHAVE DPSs, and 256 KB of L2 cache. This leads to a >40% improvement in TOPS/area.

Another improvement in NPU5 is optimizations around different AI formats such as INT8 and FP8. This makes NPU5 the first to offer FP8 format on its NPU. The new architecture also enables NPU5 to handle different types of multiplies in parallel, such as 4096 MAC/cycle INT8, 4096 MAC/cycle FP8, and 2048 MAC/cycle FP16. Compared to FP16, FP8 delivers over 50% higher performance per watt with similar results.
The following are the micro-benchmarks for NPU5 versus NPU4:

As for the TOPS, NPU5 can deliver 50 TOPS of AI compute, which is just 2 TOPS more than the Lunar Lake NPU4, which was 48 TOPS, but a major increase over the NPU3 and NPU3.5 in Meteor Lake & Arrow Lake SoCs.
- NPU1 - 0.5 TOPs
- NPU2 - 7.0 TOPs
- NPU3 - 11.5 TOPs
- NPU4 - 48.0 TOPs
- NPU5 - ~ 50.0 TOPs
- NPU6 - >TBD
Now, while the NPU5's TOPS aren't groundbreaking, we should remember that Intel has made massive optimizations on the AI software end. This has allowed Intel to remain competitive against AMD and even Qualcomm, which were the first to go 45-50 TOPS with their respective offerings, while Intel's NPU3 still manages to hold its own against their parts.

The total platform TOPS have now been increased to 180, the highest of current-gen SoCs, with the NPU providing 50 TOPS, the CPU providing 10 TOPS, and the GPU with the lion's share of 120 TOPS.
Intel Panther Lake Single-Thread & Multi-Thread Performance Uplifts
So starting with the performance numbers, we first have the single-threaded performance dissection demonstrated within SPECrate 2017 (INT).
Intel is claiming that Panther Lake CPUs will offer 10% more performance at the same power as Lunar Lake and Arrow Lake CPUs. Vice Versa, you can get 40% lower power for the same performance levels within single-threaded workloads.
On the multi-threading side, the Panther Lake CPUs offer over 50% higher performance than Lunar Lake at the same power, and 30% lower power at similar performance levels than Arrow Lake CPUs.
Intel Panther Lake Memory Support - Faster LPDDR5 & DDR5 Speeds, No MoP
In terms of memory support, Panther Lake continues to evolve DDR5/LPDDR5 support with faster speeds and higher capacities. The Core Ultra 300 CPUs will support both LPDDR5 and DDR5 memory standards.

For LPDDR5, the maximum memory speeds supported by the chips are 9600 MT/s and up to 96 GB of total capacity. For DDR5, the speeds have been upgraded to 7200 MT/s and up to 128 GB capacities. As for memory-on-package or MOP support, Panther Lake drops in favor of Memory on PCB design.
It gives OEMs more flexibility and choice to integrate the right memory standard, speed, and capacity for their platforms rather than relying on a dedicated and pre-configured memory type, which was the case with Lunar Lake's MoP design. The MoP design did lead to cost savings for OEMs, but didn't produce the cost scaling that Intel had hoped for.
- Panther Lake: DDR5-7200 / LPDDR5-9600
- Arrow Lake: DDR5-6400 / LPDDR5-8400
- Lunar Lake: LPDDR5-8533
The DDR5 speeds see a 12.5% improvement on Panther Lake CPUs versus Arrow Lake, while the LPDDR5 speeds see a 14.2% improvement on Panther Lake CPUs versus Arrow Lake. The LPDDR5 speeds are also a 12.5% increase over Lunar Lake, but you don't get traditional DDR5 support with Lunar Lake CPUs. That's another advantage that Panther Lake's lower-power offerings will have over Lunar Lake, giving OEMs the flexibility to provide both standards.

Besides memory support, the wider memory choices also provide platform providers with a wider array of options at diverse price points. There's also no PMIC to be added, which further reduces the cost and the associated implementation required with MoP. So it pretty much looks like MoP was just a one-off thing we got with Lunar Lake, but we may see it once again in the future if the cost scaling and design permit it.
Additionally, Panther Lake CPUs will also carry support for LPCAMM standards. Though we might not see such configurations at launch, they will definitely be a standout based on all the good things that we have covered regarding LPCAMM so far.
Intel Panther Lake Die Configurations & Connectivity
Intel's Panther Lake CPUs will be segmented into three different die configurations, with two of these offering diverse compute tile configurations. These include:
- Panther Lake 8C = 4 P-Cores + 0 E-Cores + 4 LP-E Cores + 4 Xe3 Cores
- Panther Lake 16C = 4 P-Cores + 8 E-Cores + 4 LP-E Cores + 4 Xe3 Cores
- Panther Lake 16C = 4 P-Cores + 8 E-Cores + 4 LP-E Cores + 12 Xe3 Cores

Each Intel Core Ultra Series 3 or Core Ultra 300 SKU will be derived from these three dies. So as you can see, Panther Lake gives Intel a lot of flexibility versus Lunar Lake and even Arrow Lake. The entry-level die with 4 P-Cores and 4 LP-E Cores is what will be replacing the Lunar Lake family, while the higher-end dies with 16 cores will be replacing Arrow Lake-H.

Intel Panther Lake "8 Core"
So, looking deeply into the three configurations, let's talk about the entry-level Panther Lake 8C die first. This chip features 8 cores in a 4 P-Core and 4 LP-E core configuration. The compute tile features the xPU too, with IPU 7.5, NPU5, and Xe Media/Display Engines. The memory subsystem for this die offers up to 6800 MT/s LPDDR5x and up to 6400 MT/s DDR5 support, along with 8 MB of memory-side cache. There is 12 MB of L2 cache for the P-Cores and 4 MB of L2 cache for the single Darkmont cluster. The compute tile is fabricated on the 18A process node.

This die will pack up to 4 Xe cores and 4 RT units based on the new Xe3 graphics architecture, and the specific graphics tile will be fabricated on the Intel 3 process technology. The platform controller tile is fabricated on TSMC's N6 process node and features up to 12 PCIe lanes (8x Gen4 + 4x Gen5), 4 TB 4.0 ports, 2 USB 3.2 ports, 8 USB 2.0 ports, Intel Wi-Fi 7 (R2) support, and Intel Bluetooth Core 6.0 support.
Intel Panther Lake "16 Core"
The Intel Panther Lake 16C die adds more cores to the mix with 16 cores on the compute tile featuring 4 P-Cores, 8 E-Cores, and 4 LP-E cores. That gives us 12 MB of L2 cache for the P-Cores, and 12 MB of L2 cache for the E-Cores in three clusters.

The maximum memory support is also upgraded to 8533 MT/s LPDDR5X and 7200 MT/s DDR5. The Platform Controller Tile features up to 20 PCIe lanes with 12 Gen5 on this SKU. The graphics tile retains the 4 Xe cores based on the Xe3 architecture, which is based on Intel's own "Intel 3" process node.
Intel Panther Lake "16 Core 12 Xe"
The flagship die is the Panther Lake 16C 12Xe, which, as the name suggests, retains the same compute tile as the Panther Lake 16C die but upgrades memory support to LPDDR5X-9600. The 9600 MT/s speeds or 150+ GB/s bandwidth and LPDDR5x standard are crucial for the bigger graphics tile, which now features 12 Xe cores and 12 RT units based on the Xe3 architecture.

The GPU tile is once again fabricated on TSMC's N3E process technology & the platform controller tile goes back to the 12 PCIe lane configuration of the 8C die.

Wireless Connectivity Gets Two Big Upgrades
Intel is adding two major wireless connectivity upgrades to Panther Lake platforms. First up is Wi-Fi7 R2, which is an integrated Wi-Fi solution called Whale Peak 2, which is an on-package solution with a dedicated PMIC.
This solution is supplemented by the Intel Killer 1775 Wi-Fi7 "BE211 CRF" module. The new solution provides up to 6 GHz band with a 320MHz double-channel width, WPA3 security with 256-bit encryption, Multi-link Operation (MLO) support, and 4K QAM.
Some new capabilities of Wi-Fi 7 R2 include:
- Multi-Link Reconfiguration (Dynamic Resource configuration and management across active links)
- Restricted TWT (Enhanced AP resource allocation based on client type & prioritization)
- Single-link eMLSR (Enables single-radio client MLO with 1 vs 2 simultaneous link probing)
- P2P channel coordination (Allows AP to reserve certain channels for P2P operation)
Then there's the Bluetooth LE Audio solution, which provides true wireless stereo and multi-stream audio support along with longer accessory battery life (up to 50% lower power consumption), the ability to broadcast your source, higher rate audio sampling (Enhanced Music & Speech Quality), enhanced headset source-switching, and improved accessibility.

So that's our round-up of the Intel Panther Lake core architecture, make sure to check out our other Tech Tour 2025 related coverages for more information.
Follow Wccftech on Google to get more of our news coverage in your feeds.



























































