ARM’s N57 NPU Has 2 TOPs Output; Company’s Designs Target Several Segments


When we talk about mobile SoC microarchitecture, 2019's provided us with a clear direction. CPU design has matured to an extent that designers and manufacturers now only have to tweak certain pre-set parameters to eke out added power or performance efficiency over a design generation's predecessor. This has led to a shift towards Machine Learning, as evidenced by Apple's A13 SoC and ARM's Matterhorn reveal earlier this year. Speaking of which, the British chip giant is out with new GPU, DPU and NPU designs today. Take a look below for the details.

ARM Introduces Midrange GPU, NPU and DPU Designs By Expanding Its Ethos & Mali Products

If there's one thing that's for certain, it's that ARM forms the backbone of Android computing. While Apple has the leeway to fully customize its IP, high, mid and low-range devices in the Android world are limited to ARM's designs and to the extent that manufacturers such as Samsung and Qualcomm tweak their products.

Arm Next-Gen GPU Architecture Close to Twice As Fast Than Predecessors

ARM, for what it's worth, is focusing on silicon designs for mid-range gadgets today. The British firm has detailed plans for its Ethos and Mali product lineups. Ethos covers the company's neural processing designs and Mali covers its graphics and display processing IP.

Prior to today, ARM's NPU portfolio consisted solely of the Ethos N77. This processor is capable of 4 trillion operations per second with 1GHz frequency, a thermal design point of 5 trillion operations per Watt and SRAM configurable between one to four MB. Since the N77 is ARM's flagship NPU design, the freshly detailed N57 and N37 don't crunch as many operations in a similar time period. But, they don't consume as much power either.

ARM's Ethos N57 and N37 NPUs Are Optimized For Int8, Int16 Datatypes, Bring End-to-End Compression On Board & Provide Hardware Support For Winograd

Before we get to the details of the types of workloads they can crunch and how they can process the aforementioned computations, it's important to first look at the basic performance parameters of the Ethos-N57 and N37, and how these parameters compare to that of the N77. Doing so will let us appreciate what the sacrifices in performance allow a piece of silicon to achieve. According to ARM, these are the three NPUs' specifications.

Product Throughput MAC/cycle Internal Memory Applications
N77: Max 4 TOPs 2048 8x8 1-4MB Smartphones, AR/VR,=
N57: Max 2 TOPs 1024 8x8 512 KB Smarthomes, midrange smartphones
N37: Max 1 TOPs 512 8x8 512 KB Smart cameras, low-range smartphones


Apple Will Potentially Secure 80 Percent of ARM-based Laptop Market This Year

For the uninitiated, MAC/Cycle measures the number of times a processing unit can add the product of two numbers to an accumulator per clock cycle. MAC units are typically used for digital signal processing, and all three of ARM's NPUs feature the same MAC computation engine.

Architecturally, the MCE is combined with a programmable layer engine and SRAM to form the NPU's computation engine. While the basic design of this engine remains similar across the three NPUs, the difference is in the silicon area dedicated to them on a single product. This allows ARM to drastically reduce the processing unit's footprint for the variety of applications shown above. The Ethos-N37, for instance, has a surface area less than 1mm².

Finally, the NPUs are designed to provide hardware-level support for the Coppersmith-Winograd matrix multiplication algorithm. They also reduce system bandwidth up to three times by utilizing lossless compression.

ARM Brings Valhall Microarchitecture To Midrange GPU Design IP Through Mali G57;  GPU Utilizes Three Execution Engines/Core For 30% Density, Power & Performance Improvements

The second big news from ARM today is that the company has decided to bring its Valhall GPU microarchitecture to mid-range smartphones. Valhall currently features on the Mali G77 and the architecture features improved instruction scheduling, a superscalar engine and scalar ISA. The Mali G57 doubles texturing performance over the G52, improve energy efficiency by 30% and can feature up to six cores.

The focus here is once again machine learning, with the GPU capable of providing a 60% inference boost over its predecessor, the Mali-G52. For the consumer, this improvement will be felt through features such as facial recognition,  speech detection and photography. The G57 also features a nice treat for VR fans. The GPU will be able to work with software to enhance image rendering in the area covered by the fovea and reduce quality in the periphery.

The Mali-G57 is designed (no surprise) keeping the needs of the Chinese market in mind as ARM believes mobile gaming in China will generate $23 billion in revenue.

ARM's Mali D37 DPU Is Designed For Small Screens, Entry-Level Tablets & Smartphones

Finally, ARM's Mali-D37 is capable of supporting 2K resolutions and runs on a single pipeline with four composition layers. It's got a 1mm² surface area at the 16nm fabrication node, uses local tone mapping for HDR and offers 30% power savings. This chip is designed for entry-level smartphones and tablets. It's also capable of supporting small screens found inside cars and airplanes.

Thoughts? Let us know what you think in the comments section below and stay tuned. We'll keep you updated on the latest.