ARM’s Cortex A77 Delivers A 35% Floating Point Improvement; Valhall Architecture Optimizes Texture Heavy Performance

May 27

We’ve got a big announcement from British chip designer ARM today. The firm has introduced its Cortex A77 cores and a brand new GPU architecture. These changes will power Android smartphones that will hit the shelves from next year, as 2019’s flagships will be powered by Qualcomm’s Snapdragon 855. The improvements from ARM promise gains in performance and power efficiency, with the company having focused on every part of its CPU core, IP and design. Take a look below for more.

ARM’s Cortex A77 Core Makes Important Changes Over Its Predecessor As The Company Launches A Brand New GPU Architecture

As we settle in 2019, changes in the smartphone market are solidifying. The need for SoCs to support computational and Machine Learning workloads has increased, and vendors are tailoring their solutions accordingly. ARM’s followed this trend by launching the Cortex A77 and a new GPU architecture dubbed ‘Valhall’.

Related Microsoft Reportedly Testing Surface Pro Prototypes With ARM Processors – Snapdragon 8cx Machine in the Works?

Starting with the Cortex A77, ARM has focused on maintaining the Cortex A76’s performance but reduce power consumption. The company has doubled branch prediction, increased fetch bandwidth, added a new ALU pipeline and increased decoder width. The Cortex A77’s branch predictor’s running bandwidth has doubled to 64B/cycle. ARM has also increased the predictor’s BTB (Branch Target Buffer) capacity to 8K entries

A nice upgrade that follows in line with Intel and AMD’s x86 designs is a brand new Macro-OP (Mop) cache on the A77’s front-end. The Mop allows the A77 to reduce branch mispredict latency to 10 cycles. ARM has also designed the A77 in a manner that allows the core to bypass its decode stage in case instructions are already present in the Mop.

ARM’s Cortex A77 CPU Promises A 20% Improvement In Single-Core Scores And A 35% Gain In Floating Point Calculations

ARM’s decision to add a new ALU in the A77’s back-end improves the core’s performance by decreasing back-end bottleneck. The A77’s L1/L2 Data Caches have dedicated issue ports for store-data pipelines and improved engines to contribute towards the aforementioned power efficiency. Their strongest improvement is in data prefetching, where the company has made improvements to allow the core to manage more instructions and adapt behavior according to memory subsystem latency.

Cache sizes for the A77, however, stay the same this year. The core has 64KB L1 and 256.512KB Private L2 ECC caches. Performance wise, ARM promises that the A77 will deliver a 23% increase in integer and 35% increase in floating point performance in SPEC2006. The chip will also improve memory latency by 15%, and the firm believes that the A77 will reach 3.0GHz, similar to its predecessor.

Related Samsung’s Exynos 9820 Has 2 New Custom Cores, 8K Recording Support

ARM’s Valhall GPU Architecture Offers A 60% Improvement In Machine Learning, A 30% Increase In Performance Density And A 30% Gain In Power Efficiency 

ARM’s latest Valhall GPU architecture is an upgrade to the company’s Bifrost architecture that’s present in the current Mali G76 GPUs. Valhall delivers impressive improvements in Performance density (30%), Machine Learning (60%) and Power Efficiency (30%). Valhall’s execution core is similar to the ones found in products from AMD and Nvidia, meaning that the architecture allows the Mali G77 to feature 16-wide warps, two shader cores with one execution engine for each and 16 FMA clusters per execution engine.

In publishing the performance figures for Valhall and the Mali G77, ARM claims that the GPU will provide between 1.4X to 1.6X performance improvement per mm² over the G76. Shader cores on the G77 are the same size as those on the G66. For machine learning, the G77 has 1.6X inference performance of its predecessor which is the courtesy 33% more processing units on the core.

The texture mapping unit on the Mali G77 doubles its throughput and it has 4 bilinear texels/clock, 2 trilinear texels/clock, two times the anisotropic filtering over the G76 and a focus on texture computing. It’s important to note that the G77’s core support is limited to 16 cores for now. The G77 also has a large IP block that consolidates the resources for earlier generations’ execution engines.

Valhall and the Mali G77 are optimized for performance on texture heavy games, and fixed issue scheduling on the graphics processor is handled by the hardware. ARM’s focus with the new graphics architecture is the execution core, which is optimized to reduce latency and improve texture mapping. Alongside its CPU and GPU designs, the company has also introduced its custom NPU (Neural Processing Unit) dubbed as the ML Processor.

This processor is capable of delivering 4 TOPS (Trillion Operations per Second) and power efficiency of 5TOPS/W. The Processor can scale up to eight NPUs and 32 TOPS in a single cluster, and it supports both convolutional and recurrent neural networks. These updates from ARM are in line with the software that will become common on the flagships of the future. Machine learning is after all at the heart of many different applications.

Thoughts? Let us know what you think in the comments section below and stay tuned. We’ll keep you updated on the latest.