Apple’s AI Server Chip “Baltra” Likely To Be Used Primarily For AI Inference

Dec 15, 2025 at 02:07pm EST
A close-up view of an unbranded central processing unit (CPU) with visible pins on a motherboard, surrounded by a digital circuit pattern.

It is common knowledge that Apple is a vertical integration aficionado, preferring to retain key technological nodes in-house wherever feasible, with its sprawling custom silicon design efforts offering perhaps the most apt illustration of this paradigm.

Enter "Baltra," the internal codename given to Apple's bespoke AI server chip, which is expected to debut in 2027.

Related Story iPhone Fold To Feature 3D Printed Hinge To Lower Costs, But Rattling Problems Risk Launch Timeline As Samsung Begins M16 OLED Manufacturing

Apple's "Baltra" AI server chip is likely to be used primarily to satisfy its gargantuan inferencing needs

Back in Spring 2024, multiple reports emerged that Apple was working with Broadcom on its first AI server chip, bearing the internal codename "Baltra." Some reports at the time also suggested that the chip would leverage TSMC's 3nm 'N3E' process, and that the design process would conclude over the coming 12-month period.

Of course, the actual deployment of these custom AI chips is now expected in 2027, with Apple having already commenced the shipment of its US-made servers back in October 2025.

However, the real question is: how will Apple use its custom AI chips? For that specific raison d'être will dictate Baltra's overall chip design and architecture.

Apple is not expected to train large AI models, at least for the time being, especially as it has already struck a deal with Google to deploy a customized 3-trillion-parameter Gemini AI model to power Apple Intelligence in the cloud, and would pay Google $1 billion per year for the right to use this model.

It is only reasonable to assume that Apple will primarily use the "Baltra" AI server chips to cater to its gargantuan AI inferencing needs. As a refresher, inference occurs whenever already-trained models leverage their knowledge base and prior training to perform a specific task, which might include something as mundane as writing an email based on the prompts provided.

Of course, the architecture of inference chips is fundamentally different from those that are used to train AI models, with the former laying much more emphasis on latency and throughput. AI inference chips also leverage lower precision math-based architecture, such as INT8.

As such, given this context, we can reasonably infer that Apple and Broadcom are likely to focus on these aspects as they proceed with Baltra's overall design process.

Meanwhile, Apple's sprawling custom silicon offerings continue to expand. In addition to its well-known A-series and M-series chips, Apple now also uses its bespoke C1 modem chips. Moreover, the Cupertino giant might also unveil a derivative of its Apple Watch-focused S-series chip within its AI smart glasses all set to launch next year.

About the author: Writing is my one incontrovertible passion. Over the past six years, he has authored over 2,200 distinct articles on financial and tech-related topics, spanning nearly 1 million words. And he has been a member of Wcctech mobile team since 2025. As an alumnus of the University of Toronto, Rotman Commerce Program, I bring nuance, in-depth knowledge, and a unique perspective to every topic that I cover. When I'm not writing, I'm traveling the world, exploring hidden confectionaries and restaurants as an aspiring food connoisseur.

Follow Wccftech on Google to get more of our news coverage in your feeds.