Apple Removes The Fog Around Its New Cloud-Based, And 20-Billion-Parameter On-Device AI Models, Brushes Aside Google’s Contributions While Hyping NVIDIA’s

•

Jun 8, 2026 at 06:15pm EDT

Apple has established a sprawling and intricate compute architecture, one that ropes in Google and NVIDIA to paper over its embarrassing AI-related shortcomings. Even so, Apple's WWDC 2026 keynote answered as many questions as raised new ones. Thankfully, the Cupertino-based tech giant is now issuing clarifications at the speed of lightning, resolving lingering uncertainties on a war footing of sorts.

Apple craftily obfuscates Google's contributions to its new Apple Intelligence architecture, taking pains to point out its own technologies at the core of this new paradigm

We already know that Apple Intelligence consists of a combo of on-device and cloud-based models. Even so, this distinction was previously not very illustrative.

Apple just clarified AFM Cloud is Apple's own model, trained with Gemini outputs

AFM local models are entirely Apple models

AFM Cloud Pro seems to be based on Gemini foundation and data, but Apple did their own pre-training, post-training, RL, etc
— Max Weinbach (@mweinbach) June 8, 2026

Thankfully, Apple has just provided a critical update, noting that the gigantic cloud-based Apple Foundation Model (AFM), which is called AFM 3 Cloud Pro and underpins complex queries, is its own creation, albeit distilled from an equivalent Google Gemini model. Of course, we already know that Apple licensed a 1.2-trillion-parameter Gemini model from Google a few months back.

It seems the iPhone maker had only licensed Google's technology for model distillation purposes. Apple also takes pains to note that it conducted its own pre-training and post-training operations on the AFM 3 Cloud Pro.

Apple has also detailed the architecture of its Private Cloud Compute (PCC) framework, going on to note:

What's new with PCC on Google Cloud is the implementation: "NVIDIA Confidential Computing with NVIDIA GPUs, Intel CPUs with TDX, and Google's Titan chip."
Apple states that while the AFM Cloud is hosted in Google Cloud, the arrangement comes with "the industry’s most comprehensive transparency guarantees that allow external security researchers to verify our privacy commitments."
"To mitigate the risk of supply chain attacks, we maintain a cryptographically verifiable, append-only ledger of all Google Cloud hardware that is part of the PCC fleet."
"PCC on Google Cloud leverages many of the same architectural security patterns as PCC on Apple silicon to implement these layered protections: initial network data parsing for each request happens in a dedicated process within its own namespace, shared inference software is recycled with a short time-to-live duration, and attested keys are held in a separate, dedicated confidential VM isolated from external inputs."
Apple also says that it will "provide public research tooling, and access to live PCC nodes in research mode through the Apple Security Bounty Program."

Apple Foundation Model Cloud Pro is the best Apple model, and runs on Nvidia GPUs in Google Cloud

Apple Foundation Model Cloud and Cloud Image run on Apple Silicon. Both are private cloud compute.
— Max Weinbach (@mweinbach) June 8, 2026

Apple has further clarified that its cloud-based models are divided into 3 categories: the AFM 3 Cloud Pro that runs on NVIDIA GPUs within Google Cloud, a vanilla model, dubbed AFM 3 Cloud, and an image generation one, called ADM 3 Cloud (Image), both of which run on Apple's own servers.

AFM Core Advanced on-device model running on A19 Pro is a sparse model.

It's 20B parameters.

It's fully Apple designed. It is an MoE but when it processes the prompt, it only loads the parameters needed and locks them in.

If it's 20B parameters total, but on a specific…
— Max Weinbach (@mweinbach) June 8, 2026

As far as on-device Apple Foundation Models are concerned, the AFM 3 Core Advanced has 20 billion parameters, but only loads the quantum of parameters strictly needed to process a given inference request, activating just 1 to 4 billion parameters at a time. Critically, this model was entirely designed by Apple, and requires the A19 Pro chip to run on an iPhone. Apple explains this in the following words:

"Instead of forcing the entire model into DRAM, the full model is stored in flash memory (NAND). Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt."

Apple then adds:

"Rather than using a single model for all tasks or managing an ensemble of smaller models, AFM 3 Core Advanced uses a predetermined number of active parameters tailored to each specific use case."

Of course, Apple has also prepared a less powerful on-device model for generalized inferencing on older iPhones, called AFM 3 Core, which has just 3 billion parameters.

Basically, all Apple models have been trained on TPUs, and all but the AFM 3 Cloud Pro run on Apple silicon.

When a user submits a request, for instance, via the Siri AI, a localized orchestrator calls the required tools, collects data, and then generates the prompt for the AFM Cloud. Critically, raw data is not sent to the cloud, just the structured prompt.

Per Craig Federhigi on how much Google gemini stuff they use for Apple Intelligence:

"we don't have the Gemini app as our app. In fact, none of that client code is part of how we run an iOS for these models. We use none of the models that Google deploys to their customers, nor…
— Ben Bajarin (@BenBajarin) June 8, 2026

Of course, this comes as Apple spent the better part of the technical presentation downplaying Google's role within the new Apple Intelligence and Private Cloud Compute framework, noting that it uses neither the models nor the infrastructure that Google deploys for its own customers.

About the author: Writing is my one incontrovertible passion. Over the past six years, he has authored over 2,200 distinct articles on financial and tech-related topics, spanning nearly 1 million words. And he has been a member of Wcctech mobile team since 2025. As an alumnus of the University of Toronto, Rotman Commerce Program, I bring nuance, in-depth knowledge, and a unique perspective to every topic that I cover. When I'm not writing, I'm traveling the world, exploring hidden confectionaries and restaurants as an aspiring food connoisseur.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Apple Removes The Fog Around Its New Cloud-Based, And 20-Billion-Parameter On-Device AI Models, Brushes Aside Google’s Contributions While Hyping NVIDIA’s

Apple craftily obfuscates Google's contributions to its new Apple Intelligence architecture, taking pains to point out its own technologies at the core of this new paradigm

Related Story Google Just Released Gemini 3.6 Flash, And It Might Be Its Worst Model To-Date

Further Reading

Apple Reportedly Grabs The OG Team Behind Open-Source Qwen, Betting on Alibaba's AI to Rescue Siri in China

Sentiment Around Apple’s Position In AI Has Changed, Says Analyst, With Company “Less Exposed To Capex Intensity,” Than Its Rivals, Giving It A Monetization Edge

CXMT Debuts With $8.6 Billion IPO As Its DRAM Surge Chips Away At Samsung's Market Dominance By 2028

Apple’s Reasons For Excluding Cheaper iPhone 18 In 2026 Highlighted By Research Firm, Says “Demand Is More Resilient” For “Pro” Models Despite Incoming Price Hike