Huawei's newest AI chip, the Ascend 950PR, might not deliver strong compute performance relative to NVIDIA for domestic hyperscalers, but it offers a major upgrade with CUDA compatibility.
Huawei's Ascend 950PR Sees Massive Interest, Mainly With CUDA-Like Programming Brought In With CANN Next
The Chinese computing industry has been trying to challenge NVIDIA's market dominance, and while the focus has been on upgrading offerings in terms of architecture and onboard features, it hasn't worked out to a large extent. Reports suggest that Chinese hyperscalers remain strongly inclined toward NVIDIA's hardware, and a key reason isn't just the compute gap; CUDA also plays a significant role. Huawei has tried to 'crack' CUDA with its native CANN offering, but that hasn't worked out yet, which is why, with the Ascend 950PR, the idea is to be a direct replacement for NVIDIA in training/inference workloads.
This time around, tech firms intend to use the new 950PR more extensively, much happier now that the chip is more compatible with Nvidia's CUDA software system and has better response speeds, said the two people and a third person with knowledge of those plans.
- Reuters
We'll dive into what the Ascend 950PR chip brings to the table in a bit, but let's talk about CUDA compatibility and Huawei's major achievement with this launch. Huawei's CANN Next software stack has undergone a major upgrade, adding a SIMT programming model with features such as thread blocks, warps, and kernel launches, similar to CUDA. The idea with CANN Next isn't to provide developers with a translation layer; it's to bring in near-drop-in replacements for CUDA equivalents, treating CUDA as a language standard while leveraging the strengths of the Ascend ecosystem.
CANN Next is optimized for compute on Ascend at scale, meaning parameters such as thread counts and block sizes are tuned for Huawei's own chips, enabling co-design scalability. For a layman to understand what Huawei is actually doing, it isn't to replace CUDA at all; rather, it's to make developers feel like they are writing in CUDA, but in reality, the performance achieved with GPU programming is Ascend-optimized and scalable. CANN Next is one of the reasons the Ascend 950PR is seen as a much more attractive solution than previous offerings.
Now, with the Ascend 950PR chip in particular, it is reported that hyperscalers like ByteDance and Alibaba plan to place orders soon, and that the firm is set to produce 750,000 chips this year. In terms of technicals, you are looking at support for low-precision data formats, up to FP8, with 1 PFLOPS of FP8 compute and 2 PFLOPS of FP4. The chip will be equipped with an interconnect bandwidth of 2 TB/s, with the firm's first "self-built HBM," called HiBL 1.0, featuring a capacity of 128GB and a bandwidth of 1.6 TB/s. The HBM technology ensures that Huawei won't face constraints in ramping up production either.
China has been in need of alternatives to NVIDIA's compute offerings, particularly for hyperscalers. Getting involved in the regulatory overhead of sourcing chips like the H200 has been a 'pain', which is why they have resorted to options like renting compute offshore or looking towards domestic options. Huawei, with CANN Next and Ascend 950PR, is looking to step up its influence within the Chinese AI industry, yet the only constraints holding it back are chip volume and whether customers are ready for mass deployment.
Follow Wccftech on Google to get more of our news coverage in your feeds.
