Baidu Subsidiary Kunlun Technology Begins Volume Production of Kunlun Core II Chips, Will Rival NVIDIA’s A100 in AI

Submit

Data Centre Dynamics reports that company Baidu has created a separate, independent semiconductor-focused business, aptly named Kunlun Chip Technology Company, valued at around $2 billion last June 2021, that will produce its next-gen Kunlun Core II chips.

Baidu's Kunlun Core II Chips Enter Volume Production, Will Tackle NVIDIA's A100 in AI

Kunlun Chip Technology Co. has begun to manufacture their Kunlun Core 2 processor, utilized for artificial intelligence applications. The Kunlun Core 2 processor shares the same microarchitecture as second-generation XPUs. It is created using 7nm technology and will offer up to three times the performance of Kunlun's other predecessors.

EU All Set To Formally Probe $54 Billion NVIDIA-ARM Merger

Baidu's Kunlun Core II Chips Enter Volume Production, Will Tackle NVIDIA's A100 in AI

Three years ago, Kunlun released information about the Kunlun K200, Kunlun's first-generation processor was created for the edge, cloud, and autonomous vehicular applications. The chip showcases up to 256 INT8 TOPS execution, about 64 TOPS INT/FP16 execution, and 16 INT/FP32 TOPS execution, topping at 150 watts of power.

Below is a comparison chart of the Baidu Kunlun Core 2, also called the Kunlun II, versus the first-gen Baidu Kunlun Core and NVIDIA's A100 chips. This chart shows how the new Baidu Kunlun II is able to keep up with NVIDIA's A100 chipset that typically utilizes 19.5 FP32 TFLOPS with 624/1248 INT8 TOPS.

Baidu Kunlun II Comparison Chart

 Baidu KunlunBaidu Kunlun IINvidia A100
INT8256 TOPS512 ~ 768 TOPS624/1248* TOPS
INT/FP1664 TOPS128 ~ 192 TOPS312/624* TFLOPS (bfloat16/FP16 tensor)
Tensor Float 32 (TF32)--156/312* TFLOPS
INT/FP3216 TOPS32 ~ 48 TOPS19.5 TFLOPS
FP64 Tensor Core--19.5 TFLOPS
FP64--9.7 TFLOPS

The Kunlun AI was originally created by the parent company Baidu in 2011. They meticulously tested and attempted to recreate their same XPU microarchitecture using many-small-core XPUs utilizing FPGAs. However, in 2018, Baidu built silicon that was devoted to the Samsung Foundry's 14nm fab processes that boosted performance around 14LPP.

Imec & GLOBALFOUNDRIES Partner Up And Announce Breakthroughs In AI Chip On IoT Devices

14LPP (Performance boosted edition) is the 2nd FinFET generation which the performance is enhanced up to 10%. 14LPP is. the single platform for every application designs with the improved performance for computing/Network designs and the lowered. power consumption for Mobile/Consumer designs.

—Samsung SAS Business site

Kunlun's AI processor uses 8 gigabytes of HBM memory that offers 512 gigabytes per second of peak bandwidth speeds. In the last half of 2020, Wang Haifeng, Baidu’s Chief Technology Officer, reported that the original Kunlun Core produced over 20,000 chips and recognized the need for the company to create a larger scaled deployment strategy.

Currently, Kunlun's first-gen chips are used for parent company Baidu's cloud structured datacenters and utilized for their Apolong autonomous vehicle platforms as well as other AI applications.

Source: MyDrivers, Tom's Hardware

Submit