AMD’s 6th Generation Carrizo APUs Officially Launched and Detailed – Upto 15% IPC, 3rd Gen GCN Cores, DirectX 12 and HSA 1.0 Support
AMD has officially launched their latest Carrizo APUs at Computex 2015 and detailed additional bits and pieces about their next generation mobility processors. The AMD Carrizo mobility processors come in two packages, the power optimized 15W variants and the more performance oriented 35W variants which will be shipping in several notebooks later this month.
AMD's 6th Generation Carrizo APUs Officially Launched and Detailed
The processors were detailed in a press briefing by Tinhte.vn and while the NDA lifts on 2nd June, we can now tell you what can you expect in terms of performance from the latest 6th generation processors. When talking about basic features, first of all, we should know that Carrizo is based on the 28nm process node and comes in the FP4 package. The Carrizo chips feature 4 x86 Excavator cores with 2 MB L2 cache, 3rd generation GCN GPU (integrated) that pack 8 graphics compute units or 512 stream processors and 2 RBs. The chips support DDR3 dual channel memory with speeds of up to 2133 MHz and are designed to feature full support for HSA 1.0 spec. The chips also integrate the southbridge on die and have several I/O technologies along with new software tier support that we will detail in just a bit. The Carrizo APUs will be branded as the 6th generation AMD FX, AMD A10 and AMD A8 series chips and we already know some product names such as:
- AMD FX-8800P
- AMD PRO FX-8800B
- AMD A10-8700P
- AMD A8-8600P
- AMD A6-8500P
- AMD PRO A10-8700B
- AMD PRO A8-8600B
- AMD PRO A6-8500B
- AMD RX-418GD
- AMD RX-216GD
So there's the round-up of what to expect from Carrizo APUs but let's get more technical. From the ISSCC 2015 presentation, we know that Carrizo features a nominal 5-15% IPC gains from the new Excavator cores which shows AMD is following Intel footsteps in this field with the blue team also offering a similar IPC improvement on their latest 14nm Broadwell Uarch. The die is still based on a 28nm node yet AMD has managed to optimize the overall chip design by adding 29% more transistors than Kaveri thanks to the high-density design library. This results in a 3.1 Billion transistor die that delivers 40% lesser power consumption and 23% lesser die area than its predecessor. The H.265 encode support allows 3.5 times transcode performance of Kaveri while the compute architecture enables the 8 GCN compute units (512 stream processors) a reduction of 20% in power consumption.
When specifically talking about Excavator cores, we get improved and larger cache sizes that allow prefetch improvements and lower latency. Better branch prediction leads to 50% increase in branch target buffer size (512 to 768 Entry)) and accelerated flush in the FPU. New instruction support include AVX2, MOVBE, SMEP and BMI1/2 along with more power gating options to cut down power when the chip remains dormant or doesn't gets utilized to full extent. The most significant gains in frequency come to 15W models while the 35W models actually able to push IPC with and 0-5% clock speed bumps. The 15W variants get a 25-45% frequency push and increase in IPC by 10%.
In terms of size, the Carrizo die measures at 244.62mm2 on the 28nm node while Kaveri measures at 245mm2 on the same process. The difference between both chips is that Carrizo ups the transistor count to 3.1 billion from Kaveri’s 2.41 billion count. The sudden reduction in the size of the die even when adding more better x86 performance was due to the fact that Excavator cores are smaller than Steamroller cores, measuring at just 14.48mm2 with a core transistor count of 102 million transistors. The L1 cache has also doubled on Carrizo to 32 KB per core from 16 KB. The overall core structure has 690 million transistors crammed in one partition while the rest of the transistors are dedicated to GCN cores that utilize HSA and compute engine advantage in general purpose computing environments.
AMD is also giving a boost update to the GCN architecture with their 3rd generation GCN cores integrated inside Carrizo. These are the same architectural enhancements as featured on Tonga and the soon to be released Fiji graphics card. The iGPU has 512 KB L2 cache, 819 GFlops of compute performance and HSA acceleration via ATC. Some features such as DirectX 12 (Level 12), improved tessellation performance, loss less delta color compression, updated ISA instruction set, high quality scaler unit, cache coherent fabric interface are available on the new GCN unit.
The most interesting thing about Carrizo, aside from its technical specifications is also the design of the chip itself. AMD for the first time is aiming for a true SOC design eliminating the need of a separate FCH as was the case with Kaveri mobile which requires Bolton FCH for additional connectivity options. The FCH will be integrated on the die itself which will deliver Security, Display, Audio, PCI-e, SATA, SD, USB, Multimedia, UART/12C. CLCKGen and Misc I/O connectivity. AMD is aiming for UVD6, VCE3 and a audio co-processors with H.264 encode while feature a display control engine “DCE11″. With HDMI 2.0 that provides up to 3 display interfaces and PCI-e Gen 3.0 x8 for discrete GPU expansion and PCI-e 3.0 x4 for GPP, the APU begins to look like a decent improvement over Kaveri from a design perspective. The FCH can deliver 4 USB 3.0 / 2.0 ports, 4 USB 2.0 ports and 2 SATA 3 ports while the memory controller allow for Dual Channel DDR3 memory rated at 2133 MHz in SoDIMM form factor (One per channel).
AMD Carrizo APU Comparison Chart:
|AMD Trinity APU||AMD Richland APU||AMD Kaveri APU||AMD Carrizo-L||AMD Carrizo APU|
|Core||x86 Piledriver||x86 Piledriver||x86 Steamroller||x86 PUMA+||x86 Excavator|
|GCN Cores||384 SPs||384 SPs||512 SPs||128 SPs?||512 SPs|
|HSA Support||No||No||Yes||No||Full HSA 1.0|
AMD Carrizo APU Die:
AMD FX-8800P Benchmarks and Confirmation:
One of the slides in the presentation has confirmed the FX-8800P as the flagship Carrizo APU of the lineup which can be seen in the gallery post below along with the previously leaked benchmark: