Apple’s New CoreAI Engine Barely Edges Out Its Own MLX Framework At Realistic 8B Model Sizes, Despite Being 2.47x Faster On Tiny Models

•

Jun 10, 2026 at 03:44pm EDT

Apple has finally debuted CoreAI, which is a successor to its CoreML engine that had reigned supreme for around 9 years, bringing with it format-agnostic inferencing and support for large-model memory footprints. Even so, initial tests are painting a much more nuanced picture of Apple's new AI framework and, in turn, its on-device models.

New benchmark tests show Apple's CoreAI "converges to a near-tie [with MLX] at a realistic 8B" model size for decoding

For the benefit of those who might not be aware, Apple launched its CoreML machine-learning framework back in 2017 to primarily run smaller, static machine-learning tasks such as image classification and tree ensembles. CoreAI is CoreML's brand-new successor that has been optimized for edge AI and on-device inference.

In contrast, MLX is an engine that is primarily geared towards research, training, and fine-tuning, and is locked to Apple's Metal GPU and unified memory architecture.

I benchmarked Apple's brand-new Core AI (WWDC'26) against MLX and CoreML for on-device LLMs on an iPhone 17 Pro. The results surprised me.
Qwen3-0.6B Decode speed (tok/s):
• Core AI (GPU, pipelined): 180 warm 🥇
• MLX (GPU): 115
• Core AI (ANE): 50
• CoreML-LLM (ANE): 39 pic.twitter.com/BDWNGoPepV
— MLBoy_DaisukeMajima (@JackdeS11) June 10, 2026

Now, a new benchmark test has just given us interesting insights into Apple's new CoreAI engine.

Firstly, for small models such as the 0.6-billion-parameter Qwen3, CoreAI is around 2.47x faster on decoding tasks than MLX on an M4 Mac. Similarly, on an iPhone 17 Pro, CoreAI is around 1.6x faster than MLX on decoding, again based on the Qwen3 0.6b model. However, when model size increases to a more practical 8 billion parameters (Qwen3 8b, M4 Max Mac), CoreAI is only 1.05x faster than MLX, and offers a near-parity decoding performance.

Interestingly, on sustained workloads on the iPhone 17 Pro, the GPU throttles relatively quickly, allowing the CoreML/Apple Neural Engine combo to sprint ahead in terms of performance retained. This combo also consumes the smallest memory, but is also the slowest at decoding tasks.

Engines optimized to specific vendor-sourced models almost always trump general engines. For instance, Google's LiteRT-LM engine running its Gemma model was not only the fastest engine on the iPhone 17 Pro (55.4 tokens per second), but it also used 4.5× less RAM than Apple's own MLX framework (641 MB vs 2,900MB).

Finally, Apple Foundation Models were found to be "2× more energy-efficient per token than the GPU-backed runtimes, 4× more than CoreML/ANE."

About the author: Writing is my one incontrovertible passion. Over the past six years, he has authored over 2,200 distinct articles on financial and tech-related topics, spanning nearly 1 million words. And he has been a member of Wcctech mobile team since 2025. As an alumnus of the University of Toronto, Rotman Commerce Program, I bring nuance, in-depth knowledge, and a unique perspective to every topic that I cover. When I'm not writing, I'm traveling the world, exploring hidden confectionaries and restaurants as an aspiring food connoisseur.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Apple’s New CoreAI Engine Barely Edges Out Its Own MLX Framework At Realistic 8B Model Sizes, Despite Being 2.47x Faster On Tiny Models

New benchmark tests show Apple's CoreAI "converges to a near-tie [with MLX] at a realistic 8B" model size for decoding

Related Story Apple Squeezes OLED Suppliers As “Chipflation” Drives Up iPhone 18 Pro Production Costs

Further Reading

Apple Reportedly Grabs The OG Team Behind Open-Source Qwen, Betting on Alibaba's AI to Rescue Siri in China

Sentiment Around Apple’s Position In AI Has Changed, Says Analyst, With Company “Less Exposed To Capex Intensity,” Than Its Rivals, Giving It A Monetization Edge

CXMT Debuts With $8.6 Billion IPO As Its DRAM Surge Chips Away At Samsung's Market Dominance By 2028

Apple’s Reasons For Excluding Cheaper iPhone 18 In 2026 Highlighted By Research Firm, Says “Demand Is More Resilient” For “Pro” Models Despite Incoming Price Hike