Intel Clearwater Forest “Xeon 6+” CPUs Deep-Dive: Up To 288 Darkmont E-Cores, 576 MB Cache, 18A With Foveros D3D + EMIB 2.5D

Hassan Mujtaba
Intel Clearwater Forest "Xeon 6+" CPUs Deep-Dive: Up To 288 Darkmont E-Cores, 576 MB Cache, 18A With Foveros D3D + EMIB 2.5D

Intel has disclosed more details of its next-gen Xeon 6+ E-core CPU family codenamed Clearwater Forest, which brings up to 288 next-gen cores.

Intel Clearwater Forest Gets 288 Next-Gen Darkmont E-Cores, Branded As Xeon 6+ For High-Density Compute Servers

Intel brought higher compute density and performance per watt gains with its last E-Core only Xeon CPU offering, Sierra Forest. This was also the first time that Intel segmented its Xeon lineup into P-Core and E-Core only families.

Related Story Intel’s 288-Core Clearwater Forest Xeon 6+ Lands on 18A, Claiming 30% Performance & 50% Efficiency Lead Over AMD’s 192-Core EPYC

Now, the journey continues with the 2nd Gen E-Core only family codenamed Clearwater Forest, which will be part of the Xeon 6+ lineup.

It Has It All - Intel 18A With RibbonFET & Power Via, Forveros Direct3D, EMIB 2.5D

With Clearwater Forest, Intel is taking its disaggregated architecture and packaging design to the next level. The chip is a multi-layered solution with several chiplets and building blocks that make it quite an engineering achievement from Intel.

One Clearwater Forest or Xeon 6+ CPU is composed of twelve EMIB tiles that use 2.5D packaging. These tiles bring three active base tiles together, which are then connected to two I/O Tiles and a total of twelve compute Tiles. The I/O Tiles utilize the Intel 7 process node, the Active Base Tiles utilize the Intel 3 process node, and the compute chiplets are fabricated on the Intel 18A process node.

Coming to the compute chiplets, these Darkmont E-Core chiplets are fabricated using the 18A process node technology with RibbonFET. Intel claims that 18A delivers the best power efficiency for core logic through lower gate capacitance. 18A also offers higher cell density and over 90% cell utilization, along with improved signal routing through power rails on the backside. 18A also enables low-loss power delivery, reducing losses by 4-5%.

Diving into the RibbonFET technology of 18A, this allows greater electrical current control and reduced power leakage for a significant performance advantage. With RibbonFET, the gate surrounds the transistor channel, giving really tight control over the electrical currents in the channel, allowing more drive and less leakage. This also contributes to lower voltage operating levels. Also, RibbonFET's gate length is 5-10% shorter than FinFET, and it also achieves 20% power reduction per transistor.

Some features of RibbonFET include:

  • Further miniaturization of chip components is critical in high-density CPUs
  • Tight control over the electrical current in the transistor channel
  • Improves perf/watt, (Vmin) operations, and electrostatics
  • Tunability through ribbon widths and multiple threshold voltage types

Then there's PowerVia, which improves standard cell utilization by up to 10% and ISO-power performance by up to 4%. PowerVia allows Intel to drive power from underneath the silicon, or Backside power in a way.

Some highlights of PowerVia include:

  • Reduced congestion and boosted performance
  • Relocates course pitch metals
  • Bumps to the back side of the die
  • Nano-Scale TSVs for efficient power distribution
  • Improved Signal Routing
  • Higher cell density - over 90% cell utilization

Clearwater Forest is also the first high-volume production CPU that leverages Foveros Direct3D tech, which is an advanced packaging solution that bridges the compute and IO tiles together on the base active tiles. Foveros Direct 3D has a 9um bump pitch and uses Cu-to-Cu bonding. It acts as an active Si interposer with high-density and low-resistance, and offers ~0.05pJ/bit performance. This means that Intel needs to spend virtually zero power to move data through and forth between the two dies.

Following is a 3D Construction overview of the Clearwater Forest Xeon 6+ CPU:

Dissecting The Three Main Tiles of Clearwater Forest

Alright, so next up, we get to see what each of the three main tiles has to offer. Once again, there are three tiles: the Compute Tile, the I/O Tile, and the Base Tile.

Clearwater Forest I/O Tile

The Clearwater Forest IO Tile is fabricated on the Intel 7 process technology. This tile features a total of eight accelerators in two packages, which offer Intel Quick Assist Technology, Intel Dynamic Load Balancer, Intel Data Streaming Accelerator, and Intel In-Memory Analytics Accelerator. There are 16 accelerators across the two I/O tiles.

Each I/O Tile also offers 48 PCIe Gen 5.0 lanes (combined total of 96), 32 CXL 2.0 lanes (combined total of 64), and 96 UPI 2.0 lanes (combined total of 192). The IO Tile remains unchanged from Granite Rapids, but is a clear upgrade over Sierra Forest.

Clearwater Forest Base Tile

Moving on to the Base Tile, this Tile is connected using EMIB, which connects to the compute Tiles above it. There are three Base Tiles fabricated on the Intel 3 process technology. Each Base Tile carries four DDR5 memory controllers for a total of 12 memory channels on the chip. The Tile also packs a shared LLC with 48 MB per compute tile or 192 MB per base tile. This provides a massive 576 MB of on-package LLC.

Clearwater Forest Compute Tile

The compute tiles on Clearwater Forest are probably the most interesting ones on the chip since they are based on the new 18A process technology. Each compute tile is composed of 6 modules, & each module packs 4 Darkmont E-Cores. That gives us 24 Darkmont E-Cores per compute tile and 288 E-Cores across 12 compute tiles.

Each module also packs 4 MB of L2 cache, which means that you are looking at 24 MB of L2 cache per tile and 288 MB of total L2 cache across 12 compute tiles. This is the same as the Sierra Forest E-Core CPUs & gives us a combined L3+L2 cache of 864 MB across the entire chip.

So you have the following:

  • 12x Compute Tiles (Intel 18A)
  • 3x Active Base Tiles (Intel 3)
  • 2x Intel I/O Tiles (Intel 7)
  • 12x EMIB Tiles (EMIB 2.5D)

Darkmont E-Core Deep-Dive

Now we get to talk about the Darkmont E-Core, which has also been used in Panther Lake client CPUs.

This core is largely similar to the Skymont architecture we saw on Lunar Lake and Arrow Lake CPUs, but, versus Crestmont, it is a big upgrade.

Starting with the details, Skymont comes with an updated prediction block with 128 bytes, faster "Find the next" instructions, and 96 instruction bytes for Parallel Fetch. Darkmont is a 9-wide microarchitecture and also features a wider Decode, which includes 9-wide (3x3) or 50% more decode clusters than Crestmont E-Cores, a Nanocode that unlocks microcode parallelism per cluster, and a Uop queue capacity that's increased from 64 entries to 96 entries. There's also a larger 64KB Instruction Cache alongside accurate and enhanced branch prediction.

On the front-end side (OOE or Out of Order Engine), we are looking at an 8-wide allocation and 16-wide retire, which means that resources can be added and cleared faster. Queuing also gets more resources, with the out-of-order window now growing to 416 entries.

Dispatch ports have been increased to 26. The Scalar Engine gets 8 Integer ALUs, 3 load & 4 store AGU ports, 3 jump ports, 2 integer store data ports, & the Vector Engine gets 4 vector/float ALUs, 2 vector/float store data, 4 vector/float stacks.

The memory subsystem enhancements see an increase across the board, with the L2 cache being 4 MB L2 per four-core clusters, double the bandwidth from 64B to 128B/ cycle, and L1 to L1 transfers are now faster and offer a more predictable communication.

This is achieved by eliminating the need to transfer data from the fabric and instead just going to the L2 cache via the L1 cache. The conviction clock has been upgraded from 16 bytes to 32 bytes per clock.

The following is the comparison between Crestmont and Darkmont E-Core architectures:

Bringing it all together, Darkmont E-Cores on Clearwater Forest offer up to 90% higher performance than the 144-core Xeon 6780E "Sierra Forest" CPU, offering 23% improvement in efficiency across the load line & up to 8:1 server consolidation with lower TCO.

Early Performance Data

Intel has also shared a few performance metrics for Clearwater Forest "Xeon 6+" CPUs. The comparison includes the Xeon 6700E "Sierra Forest" chips with 144 cores and also the unreleased Xeon 6900E "Sierra Forest" chips with 288 cores.

Compared to the 144-core Sierra Forest (Xeon 6780E) at 330W, the Clearwater Forest chip with 288 cores and 450W TDP offers a 36.3% lower TDP, twice as many cores, 112.7% higher performance, and 54.7% higher performance per watt.

Compared to the 288-core Sierra Forest chip at 500W, the Clearwater Forest chip with 288 cores and 450W TDP runs at 11% lower TDP while offering 17% higher performance and 30% higher performance per watt.

The performance uplift is achieved with the new Darkmont E-Cores, which have a 17% uplift in IPC. With these, Clearwater Forest brings 1.9x higher performance, 23% improvement in efficiency, and up to 8:1 server consolidation versus an aging Xeon platform.

Intel Xeon 6+ CPU & Platform Details

Now for the platform details, Intel's Clearwater Forest "Xeon 6+ CPUs will be supported on the LGA 7529 socket in 1S & 2S configurations. This is the same socket used by the Xeon 6900P "Granite Rapids-AP" CPUs. The same socket was also going to use the 288-core version of Sierra Forest "Xeon 6900E," though those were cancelled. The chips will be rated at 300-500W TDP, which is the same operating range as the Xeon 6700E and 6900P CPUs. The lower TDP spec will also come with half the number of cores, such as 144, the same as Xeon 6700E.

The chips will be able to support up to 12-channel DDR5 memory with speeds of up to 8000 MT/s. In addition to that, the platform will support up to 6 UPI 2.0 links (up to 24 GT/s per lane), up to 96 PCIe Gen5.0 lanes (x16,x8,x4,x2), and up to 64 CXL 2.0 lanes.

Security features will include Intel Software Guard Extensions or SGX, and Intel Trust Domain Extensions or TDX. On the power management side, these chips will carry Intel AET (Application Energy Telemetry) & Intel Turbo Rate Limiter. And lastly, Clearwater Forest CPUs will get Advanced Vector Extensions 2 with VNNI and INT8 support.

So, rounding up Intel's Clearwater Forest "Xeon 6+" versus Sierra Forest "Xeon 6", we will see:

  • Up To 2x Core Count
  • 17% IPC per core
  • >5x Last Level Cache
  • +4 Memory Channels
  • +2 UPI Links
  • 20% Faster memory speed

Intel's Clearwater Forest "Xeon 6+" CPUs are expected to launch in 2H 2026, so we can expect more information and performance metrics to be shared on later dates.

Intel Xeon CPU Families (Preliminary):

Family BrandingCoral RapidsDiamond RapidsClearwater ForestGranite RapidsSierra ForestEmerald RapidsSapphire RapidsIce Lake-SPCooper Lake-SPCascade Lake-SP/APSkylake-SP
Process NodeIntel 14A?Intel 18A-PIntel 18AIntel 3Intel 3Intel 7Intel 710nm+14nm++14nm++14nm+
Platform NameTBDIntel Oak StreamIntel Birch StreamIntel Birch StreamIntel Mountain Stream
Intel Birch Stream
Intel Eagle StreamIntel Eagle StreamIntel WhitleyIntel Cedar IslandIntel PurleyIntel Purley
Core ArchitectureTBDPanther Cove-XDarkmontRedwood CoveSierra GlenRaptor CoveGolden CoveSunny CoveCascade LakeCascade LakeSkylake
MCP (Multi-Chip Package) SKUsYesYesYesYesYesYesYesNoNoYesNo
SocketTBDLGA XXXX / 9324LGA 4710 / 7529LGA 4710 / 7529LGA 4710 / 7529LGA 4677LGA 4677LGA 4189LGA 4189LGA 3647LGA 3647
Max Core CountTBDUp To 192 P-CoresUp To 288Up To 128Up To 288Up To 64?Up To 56Up To 40Up To 28Up To 28Up To 28
Max Thread CountTBDUp To 192Up To 288Up To 256Up To 288Up To 128Up To 112Up To 80Up To 56Up To 56Up To 56
Max L3 CacheTBDTBDTBD480 MB L3108 MB L3320 MB L3105 MB L360 MB L338.5 MB L338.5 MB L338.5 MB L3
Memory SupportTBDUp To 16-Channel DDR5-9000+Up To 12-Channel DDR5-8000Up To 12-Channel DDR5-6400
MCR-8800
Up To 12-Channel DDR5-6400Up To 8-Channel DDR5-5600Up To 8-Channel DDR5-4800Up To 8-Channel DDR4-3200Up To 6-Channel DDR4-3200DDR4-2933 6-ChannelDDR4-2666 6-Channel
PCIe Gen SupportPCIe 6.0PCIe 6.0PCIe 5.0 (96 Lanes)PCIe 5.0 (136 Lanes)PCIe 5.0 (88Lanes)PCIe 5.0 (80 Lanes)PCIe 5.0 (80 lanes)PCIe 4.0 (64 Lanes)PCIe 3.0 (48 Lanes)PCIe 3.0 (48 Lanes)PCIe 3.0 (48 Lanes)
TDP Range (PL1)TBDTBDUp To 500WUp To 500WUp To 350WUp To 350WUp To 350W105-270W150W-250W165W-205W140W-205W
3D Xpoint Optane DIMMTBDTBDN/ADonahue PassN/ACrow PassCrow PassBarlow PassBarlow PassApache PassN/A
CompetitionTBDAMD EPYC VeniceAMD EPYC TurinAMD EPYC TurinAMD EPYC BergamoAMD EPYC Genoa ~5nmAMD EPYC Genoa ~5nmAMD EPYC Milan 7nm+AMD EPYC Rome 7nmAMD EPYC Rome 7nmAMD EPYC Naples 14nm
Launch2028-20292027202620242024202320222021202020182017
Hassan Mujtaba Photo

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button