Meta's custom silicon isn't going anywhere, and according to the company's new announcements, it appears that they have 'doubled down' on ASIC efforts, focusing on inference-class performance.
Meta's Chiplet Modularity Allows Them To Spin Out Four New Generations In Just Two Years
The demand for compute has become so tremendous that hyperscalers are eventually "forced" to diversify away from traditional options offered by GPU manufacturers like NVIDIA, and one way to do so is to develop custom silicon. Google and Amazon are two of the more prominent examples of how 'fruitful' ASIC efforts can turn out when optimized for internal workloads, and it appears that Meta is following in their footsteps. In their latest blog post, Meta revealed that their MTIA roadmap is on track, and interestingly, with a cadence that is one of the most aggressive ones out there.
Meta claims it intends to deploy "four" new chips within the next two years under the MTIA family, each targeting a specific workload, from training to GenAI inference. Starting from the MTIA 300, you are looking at an option primarily for ranking and recommendation workloads, which is why the scale-out network with this chip runs at 200 GB/s. The chip itself features one compute and two network chiplets, along with several HBM stacks with 216 GB of capacity and 6.12 TB/s of bandwidth. The hyperscaler claims that MTIA 300 laid the foundation for the more advanced MTIA 400, which is a lot more competitive.
| Metric | MTIA 300 | MTIA 400 | MTIA 450 | MTIA 500 |
| Workload Focus | R&R Training | General | GenAI Inference | GenAI Inference |
| Module TDP | 800 W | 1200 W | 1400 W | 1700 W |
| HBM Bandwidth | 6.1 TB/s | 9.2 TB/s | 18.4 TB/s | 27.6 TB/s |
| HBM Capacity | 216 GB | 288 GB | 288 GB | 384-512 GB |
| MX4 Performance | — | 12 PFLOPs | 21 PFLOPs | 30 PFLOPs |
| FP8/MX8 Performance | 1.2 PFLOPs | 6 PFLOPs | 7 PFLOPs | 10 PFLOPs |
| BF16 Performance | 0.6 PFLOPs | 3 PFLOPs | 3.5 PFLOPs | 5 PFLOPs |
| Scale-up Domain Size | 16 | 72 | 72 | 72 |
| Scale-up Network (unidirectional bandwidth)* | 1 TB/s | 1.2 TB/s | 1.2 TB/s | 1.2 TB/s |
| Scale-out Network (unidirectional bandwidth)* | 200 GB/s** | 100 GB/s | 100 GB/s | 100 GB/s |
With MTIA 400, you are looking at 400% higher FP8 FLOPS and 51% higher HBM bandwidth versus the previous generation, since Meta's focus here is on raw performance. The MTIA 400 has a 72-chip scale-up configuration connected via a switched backplane. This chip generation is already heading towards deployment, indicating that the hyperscaler is satisfied with its "competitive" performance. The more interesting options are the MTIA 450 and the MTIA 500, which directly target inference demand by focusing on HBM capacity and bandwidth.
Meta says it plans to compete with "commercially available" options by maintaining a high-velocity product cadence to keep up with evolving compute demands. This fast-paced cycle is possible because Meta manages chiplet modularity by swapping individual chiplets with each generation, ensuring there's no need to revamp the entire infrastructure. At the same time, with the MTIA 450 and 500, the hyperscaler employs an inference-first approach to diversify itself away from what a standard GPU offers.
Meta's latest deal with NVIDIA, along with recent reports, suggested it might ditch its custom silicon efforts, but it appears the firm is confident in its engineering abilities, which is why it has adopted a rather 'aggressive' strategy. All of the above generations discussed will be deployed by 2026 or 2027, helping the hyperscaler overcome the compute bottleneck.
Follow Wccftech on Google to get more of our news coverage in your feeds.
