Google is reportedly working with Marvell on the development of two chips, one of which optimizes existing TPUs, & the other one is a next-gen TPU design.
Google Might Join Forces With Marvell To Further Enhance the TPU Ecosystem For Next-Gen AI Models
Talks between Google & Marvell have commenced on the development of two brand new chips for AI inference, reports The Information.
Google is in talks with Marvell Technology to develop two new chips aimed at running AI models more efficiently, according to two people with direct knowledge of the discussions. One is a memory processing unit designed to work alongside Google's tensor processing unit. The other is a new TPU built specifically for running AI models.
via The Information
While the exact nature of what stage the talks are currently in remains a mystery, based on the initial assessment that two chips have been proposed by Google, one aiming to boost existing TPUs, & the second chip being a brand new TPU design, it looks like a baseline has been set.
The two chips that have been discussed are very different in their purpose. The first one is related to the TPU, but rather than being a custom TPU silicon, it is going to be a memory processing unit that pairs with a TPU. We can think of in-memory processing being one of the aspects where this specific accelerator or IP block will offset some of the memory requirements from the chip or system and send it over to the dedicated MPU.
The second chip that has been discussed is a next-gen TPU, which will specifically be optimized for AI inference models. Currently, Google's flagship AI accelerator is its TPU v7 or Ironwood series. TPU v7 offers 192 GB HBM memory, 4614 TFLOPs of peak performance, and is packaged into the Superpod, which is made up of 9216 chips.
While ASICs are being seen as a huge deal for AI inferencing, the challenges within the current supply chain still persist. There have been reports that the demand for Google's TPUs, such as Ironwood, is gaining momentum, but we also have to factor in production, which has reached its limit at every major semiconductor company.
The MPU sounds more like a secondary inference accelerator like NVIDIA's Groq 3 LPX, which is an LPU (Language Processing Unit). The LPU packs 500 MB of SRAM memory at a blazing fast 150 TB/s rate of total bandwidth, and maximizes Agentic AI workloads on the upcoming Vera Rubin platforms.
Based on the reports, we can expect next-generation Google TPUs coupled with the aforementioned MPUs to further accelerate the memory subsystem for faster AI model performance, especially in the inferencing segment.
Follow Wccftech on Google to get more of our news coverage in your feeds.
