Despite Its Dip In Popularity, Apple Reveals AI Model Training Tactics – From Mass Web Scraping To Secret Licensing Deals And Synthetic Content

Ezza Ijaz • Jul 21, 2025 at 11:38am EDT

Apple's approach to training its AI models — Apple details on how it trains its AI models

While the WWDC majorly revolved around the new visual design language coming to its operating system, calling the Liquid design, it also announced the next generation of its AI foundational models that would be built for both on-device and cloud. After the event, the tech giant seems to be letting users and the tech community dig deep into how its models are trained and optimized through an elaborate technical report, which allows for a better understanding of Apple's AI strategy. The company emphasizes in its report the truly focused approach it had when training the models with privacy and efficiency at the core.

Inside Apple's Next-Gen AI: How the models were built and trained

Despite losing its popularity in the AI space, Apple has released a detailed report on its foundation models called the "Apple Intelligence Foundation Language Models -Tech report 2025," which gives us in-depth information on the key elements of the latest AI models. The document covers pretty much everything, from the architecture of the model to the training period, post-training, and then how the models were fine-tuned. It also explores the methods used to ensure technical improvements in the models so that they are more efficient and do not compromise privacy.

While Apple had previously shared about its on-device AI models that are available for developers' use and about the 3 billion parameters it has, the limitation was that its structure was sparse until now. As per the report, the model is put into parts to boost efficiency. The first part is referred to as Block 1 and contains more than 60 percent of the core building blocks called the transformer layers. AI then understands the main way of language, and then responses are generated.

The second part is called Block 2 and is lighter due to removing two technical pieces that take up a lot of memory: key and value projection. Because of this strategy, Apple was able to have the model use about 38 percent less memory and even speed up the response time of the model. The company has been looking into ways to improve its AI models' performance locally, and a few years ago, it explored the idea of running a model larger than the memory of a device could handle. Although it did not go with the approach laid out, it keeps seeing ways to counter hardware limitations and other challenges.

Regarding the server-side of the AI model, Apple ensured that a custom architecture was used for its Private Cloud Compute system. The approach is called Parallel-Track Mixture-of-Experts (PT-MoE) and is a smart strategy, as if we phrase it in simpler words, breaks large AI models into smaller parts called the experts. Now, by dividing the model into a Mixture of Experts, the model does not need to be run entirely every time; instead, it could only focus on the relevant expert for the task at hand. Only part of the model that has expertise in the domain would be activated, allowing for saving performance and increasing efficiency.

Apple additionally designed a new Transformer architecture called the Parallel Track Transformer, which has multiple tracks working independently and only working together at key points. Because of this process, the model does not experience any system-wide lags. The Cupertino tech giant has also removed one of the biggest pain points with Apple Intelligence, that is, the limited support for languages.

With its new models, it has truly improved multilingual capabilities. To expand the language support, Apple increased the non-English data in its training process from 8 percent to 30 percent, which included both real and AI-generated content, so that the model had a better understanding and was equipped with a broader range of languages. This would allow features like Writing Tools to work better. When it comes to the training of its new AI systems, Apple relied heavily on the web data collected by Applebot, the company's own web crawler, and has been used with previous models as well. The interesting part is that since Apple respects privacy, if a website does not want to be crawled, it would not use its content.

The company uses multiple techniques to train its models; mainly, public web data is used for the training material. Apple tends to filter irrelevant content and focus on datasets that are useful and to the point. Similarly, the tech giant also relies on publishers' licensed content, although it does give away the names of the media companies it relies on. The company also uses smaller models for collecting synthetic data, especially when it comes to image-language tasks, codes, or instruction following, for better fine-tuning.

The multi-approach also involves visual data, as the giant has over 10 billion image-caption pairs, including screenshots and handwritten notes. Its own models are also used to generate richer captions. All of these training methods help Apple build smarter and more capable models. Apple's approach to training its AI models is well-articulated. It is a balanced strategy that ensures the system remains powerful and versatile without compromising on its core value: privacy.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Despite Its Dip In Popularity, Apple Reveals AI Model Training Tactics – From Mass Web Scraping To Secret Licensing Deals And Synthetic Content

Despite Its Dip In Popularity, Apple Reveals AI Model Training Tactics – From Mass Web Scraping To Secret Licensing Deals And Synthetic Content

Inside Apple's Next-Gen AI: How the models were built and trained

Trending Stories

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

A Modder Fits Entire Grand Theft Auto PS2 Trilogy Inside a Single Game, While Rockstar Continues to Prepare GTA 6

Kirin 9030 In-Depth Analysis Proves SMIC Can Create Denser SoCs Than Intel Has With Its 18A Node, But The Attributes That Require Improvements Are Left Out

Nintendo Doubles Down on Switch 2 Security, But Developer Gezine Cracks a Universal Exploit That Works Entirely Offline

Microsoft Looking To Save As Much As $600 Million By Swapping GPT And Claude For China’s Kimi K3 In Copilot, Risking A Rap On The Knuckles From The Trump Administration

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

Despite Its Dip In Popularity, Apple Reveals AI Model Training Tactics – From Mass Web Scraping To Secret Licensing Deals And Synthetic Content

Inside Apple's Next-Gen AI: How the models were built and trained

Related Story NVIDIA Is Secretly Acquiring Massive Dark Fiber Capacity Across The US, With Total Bandwidth Potentially Reaching An Unbelievable 7.6 Petabits/Sec

Further Reading

Trending Stories

Popular Discussions