Cerebras CS-2 Wafer Scale Chip Outperforms Every Single GPU By Leaps & Bounds, Breaks Record of Largest AI Model Trained on A Single Device

Jason R. Wilson • Jun 23, 2022 at 02:57pm EDT

Cerebras has just proclaimed a juncture for the company, the most significant learning initiative of the most extensive global Natural Language Processing (NLP) AI model in a single apparatus developing and manufacturing the development and manufacturing of the world's largest accelerator chip, the CS-2 Wafer Scale Engine.

Cerebras access twenty billion parameters in workloads on a single chip

The artificial intelligence model trained by Cerebras climbed to a unique and remarkable twenty billion parameters. Cerebras completed this action without having to scale the workload across numerous accelerators. Cerebras' triumph is critical for machine learning in that the infrastructure and complexity of the software requirements are reduced compared to previous models.

The Wafer Scale Engine-2 is engraved in an individual 7 nm wafer, equalling hundreds of premium chips on the market, and features 2.6 trillion 7 nm transistors. Along with the wafer and transistors, the Wafer Scale Engine-2 incorporates 850,000 cores and 40 GB of integrated cache with a 15kW power consumption. Tom's Hardware notes that "a single CS-2 system is akin to a supercomputer all on its own."

The benefit for Cerebras utilizing a 20 billion-parameter NLP model in an individual chip allows for the company to reduce its overhead in the cost of training thousands of GPUs, hardware, and scaling requirements. In turn, the company can eliminate any technical difficulties of partitioning various models across the chip. The company states this is "one of the most painful aspects of NLP workloads, [...] taking months to complete."

It's a tailored issue that's unusual not only to each processed neural network, GPU specifications, and the overall network combining all the components, which researchers must take care of before the first section of training. The training is also solitary and cannot be used on multiple systems.

In NLP, bigger models are shown to be more accurate. But traditionally, only a select few companies had the resources and expertise necessary to do the painstaking work of breaking up these large models and spreading them across hundreds or thousands of graphics processing units. As a result, few companies could train large NLP models – it was too expensive, time-consuming, and inaccessible for the rest of the industry. Today we are proud to democratize access to GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B, and GPT-NeoX 20B, enabling the entire AI ecosystem to set up large models in minutes and train them on a single CS-2.

— Andrew Feldman, CEO and Co-Founder, Cerebras Systems

Currently, we have seen systems that perform exceptionally well with having to use fewer parameters. One such system is Chinchilla, which continually exceeds GPT-3 and Gopher's 70 billion parameters. However, Cerebras' accomplishment is exceptionally significant in that researchers will find that they will be able to calculate and create gradually elaborate models on the new Wafer Scale Engine-2 where others cannot.

The technology behind the vast amount of workable parameters uses the company's Weight Streaming technology, allowing researchers to "decouple compute and memory footprints, allowing for memory to be scaled towards whatever the amount is needed to store the rapidly-increasing number of parameters in AI workloads." In turn, the time taken for setting up the learning will be reduced from months to minutes with only a few standard commands, allowing to switch flawlessly between GPT-J and GPT-Neo.

Cerebras' ability to bring large language models to the masses with cost-efficient, easy access opens up an exciting new era in AI. It gives organizations that can't spend tens of millions an easy and inexpensive on-ramp to major league NLP. It will be interesting to see the new applications and discoveries CS-2 customers make as they train GPT-3 and GPT-J class models on massive datasets.

— Dan Olds, Chief Research Officer, Intersect360 Research

News Source: Tom's Hardware

About the author: Jason R. Wilson is a member of the Hardware news team at Wccftech. Equipped with a background in graphic design and writing, Jason works daily to improve his craft and continues to create new and innovative ideas every day.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Cerebras CS-2 Wafer Scale Chip Outperforms Every Single GPU By Leaps & Bounds, Breaks Record of Largest AI Model Trained on A Single Device

Cerebras CS-2 Wafer Scale Chip Outperforms Every Single GPU By Leaps & Bounds, Breaks Record of Largest AI Model Trained on A Single Device

Cerebras access twenty billion parameters in workloads on a single chip

Trending Stories

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

Over 80% Of Samsung Foundry Workers Are Planning To Leave Amid A Yawning Pay Gap With The Memory Division

CXMT Supply Chain To Witness Major Process Transition To Seize DDR6 Opportunity Before Commercialization, Threatening Samsung’s And SK hynix’s Global Hold

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

SpaceX Awards Foxconn A Part In A Huge $52 Billion Order For 13,000 Racks Of NVIDIA GB300 AI Servers, Where Each Rack Costs $4 Million And The Total Order Spans Nearly 1 Million GPUs

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

Cerebras CS-2 Wafer Scale Chip Outperforms Every Single GPU By Leaps & Bounds, Breaks Record of Largest AI Model Trained on A Single Device

Cerebras access twenty billion parameters in workloads on a single chip

Related Story Jim Keller Says Cerebras IPO Was Helpful As Tenstorrent Set To “Beat Them on Everything”, Confirms Meeting With Intel & Qualcomm CEOs “Hoping To Get A Big Deal”

Further Reading

Trending Stories

Popular Discussions