Graphcore’s Colossus GC200 7nm Chip Competes Against The NVIDIA A100 GPU With Colossal Design & 250 TFLOPs AI Performance – 59.4 Billion Transistors In An 823mm2 Die

• Jul 15, 2020 at 10:56am EDT

The AI segment is seeing rapid progress with major tech companies pooling in lots of resources to keep up with the demand for higher performance each year. We've seen NVIDIA and AMD actively building next-generation GPUs specifically with AI and HPC in mind but it looks like competition has arrived from British AI chip designer, Graphcore, who has unveiled its second-generation chip for AI which directly competes against NVIDIA's A100 Tensor Core GPU accelerator.

Graphcore's GC200 Is A Massive 7nm Chip For AI Tasks Which Is Designed To Compete Against NVIDIA's A100 GPU - IPU Delivers Up To 250 Teraflops of AI Compute

For this purpose, Graphcore has announced its new Colossus MK2 GC200 IPU or an Intelligence Processing Unit which is designed exclusively to power machine intelligence. Just like its name, the chip itself features a colossus design and delivers an 8x performance bump over its predecessor, the MK1.

“We’re 100% focused on silicon processors for AI, and on building systems that can plug into existing centers. Why would we want to build CPUs or GPUs if those already work well? This is just a different toolbox.” via Graphcore's CEO, Nigel Toon

The Colossus MK2 GC200 is fabricated on TSMC's 7nm process node and features a die size of 823 mm2. For comparison, that's almost as big as the NVIDIA A100 GPU accelerator which measures at 826mm2. The chip is not only a behemoth in terms of size but also in terms of density with a total of 59.4 Billion transistors onboard compared to 54.2 Billion transistors on the NVIDIA A100 GPU. It shows a higher density on the Graphcore chip than NVIDIA's flagship chip accelerator.

To make the GC200 work, it is configured with 1472 IPU titles, each with an IPU core & In-processor memory. Each IPU core has 6 threads executing in parallel which put the total number of threads in the chip at 8832 (1472 cores / serial processor). For memory, the chip makes use of an on-die solution which offers 900 MB capacity per IPU and a memory bandwidth of 47.5 TB/s. Graphcore has gone with a smaller capacity but the higher-bandwidth solution and stated that you can theoretically get more capacity when using several racks at once and the memory pool would end up higher when compared to a rack composed of A100 GPUs.

For interconnectivity, the chip uses the IPU-Exchange fabric which provides 8 TB/s bandwidth to all IPUs. The chip is composed of 10 IPU links which a 320 GB/s chip to chip bandwidth. The MK200 also supports PCIe Gen 4 (x16) interface. As for computing output, the MK200 delivers 250 TFLOPs of peak FP16 (with Sparsity) and 62.5 TFLOPs (with Sparsity) of peak FP32 performance. The NVIDIA A100 GPU delivers a total of 312 TFLOPs of FP16 (624 TFLOPs with Sparsity) and 19.5 TFLOPs FP32 (156 TFLOPs with Sparsity).

graphcore-colossus-mk2-gc200_ipu-m2000-server_chip_4

graphcore-colossus-mk2-gc200_ipu-m2000-server_chip_2

The IPU-Machine - A 1 PetaFlop Rack With Four MK200 IPUs

In addition to the Colossus MK200 IPU, Graphcore is also unveiling its competitor to the NVIDIA DGX A100 rack codenamed the IPU-M2000. This rack is composed of four MK200 IPUs, all of which offer a combined memory pool of 450 GB. The CPU that powers the rack is an ARM Cortex-A quad-core SOC and the system comes with a 1U chassis design featuring an advanced cooling system.

From the looks of it, each IPU has an aluminum fin heatsink block attached over it which features six massive heat pipes that make direct contact with the heatsink block and leads to a massive block at the rear of the rack which is cooled off by central cooling from the rack station.

graphcore-colossus-mk2-gc200_ipu-m2000-server_chip_5

graphcore-colossus-mk2-gc200_ipu-m2000-server_chip_6

In terms of performance metrics, Graphcore has compared eight M2000 IPU-Machine racks to a single DGX-A100. The reason is the performance per dollar metric for these comparisons. The DGX A100 costs $199,000 (MSRP) while eight M2000 racks would cost $259,600, MSRP). Graphcores unveils that their solution offers 12x the FP32 compute, 3x the FP16 compute, & 10x the memory over NVIDIA's solution. Do note that the figures for DGX-A100 are derived without sparsity whereas Graphcore's own numbers are derived with sparsity included.

With sparsity, the DGX-A100 will stand at around 5.0 TFLOPs in FP16 versus 8 TFLOPs and 1.248 PFLOPs in FP32 versus 2.0 PFLOPs which still gives the M2000 an edge of 60% in performance while costing 30% higher. In addition to these performance metrics, Graphcore says that the GC200 IPU platform is super flexible in the sense that you can have up to 64,000 of these chips running all-together which will be able to deliver a massive 16 Exaflops of compute horsepower.

graphcore-colossus-mk2-gc200_ipu-m2000-server_chip_12

graphcore-colossus-mk2-gc200_ipu-m2000-server_chip_11

As far as availability is concerned, Graphcore states that customers can pre-order the IPU-Machine today with full volume shipments starting sometime in Q4 2020.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Graphcore’s Colossus GC200 7nm Chip Competes Against The NVIDIA A100 GPU With Colossal Design & 250 TFLOPs AI Performance – 59.4 Billion Transistors In An 823mm2 Die

Graphcore’s Colossus GC200 7nm Chip Competes Against The NVIDIA A100 GPU With Colossal Design & 250 TFLOPs AI Performance – 59.4 Billion Transistors In An 823mm2 Die

Graphcore's GC200 Is A Massive 7nm Chip For AI Tasks Which Is Designed To Compete Against NVIDIA's A100 GPU - IPU Delivers Up To 250 Teraflops of AI Compute

The IPU-Machine - A 1 PetaFlop Rack With Four MK200 IPUs

Trending Stories

Apple Being Forced To Drop 2nm Process After Just Two Generations Because Of AI, As Race To Secure 1.4nm Supply Takes Paramount Importance

Scalpers Are Already Flipping Steam Machine Reservations On eBay For $2,700–$2,900, Roughly Double Valve’s Price

Jim Keller Says Cerebras IPO Was Helpful As Tenstorrent Set To “Beat Them on Everything”, Confirms Meeting With Intel & Qualcomm CEOs “Hoping To Get A Big Deal”

Jefferies Warns Memory Prices Will Surge 50% in Q3 2026 and Another 40% in Q4, With No Relief Until 2028

MacBook Pro Resellers Are Jacking Up Prices For Configurations Paid Before Apple Introduced Its Hikes, Despite This Practice Never Mentioned In Their T&C

Popular Discussions

Intel Nova Lake Dual-Tile CPUs Reportedly Feature Up To 474W PL2 Power Limit

AMD Rolls Out FSR 4.1 For RX 7000 GPUs, Builds a Lightweight ML Model for RDNA 3.5 and RDNA 3 iGPUs

AMD’s FSR 4.1 Doubles RX 7900 XTX frame Rates In Cyberpunk 2077, Jumping From 24 To 50 FPS At 4K

YouTuber Daniel Owen And Club386 Got Their RTX 5090 Connectors Cooked; Club386 Calls It A “Flawed Design”

Valve’s $1049 Steam Machine Either Hides a Fat Margin or Got Rinsed by Suppliers, Says AMD Leaker

Graphcore’s Colossus GC200 7nm Chip Competes Against The NVIDIA A100 GPU With Colossal Design & 250 TFLOPs AI Performance – 59.4 Billion Transistors In An 823mm2 Die

Graphcore's GC200 Is A Massive 7nm Chip For AI Tasks Which Is Designed To Compete Against NVIDIA's A100 GPU - IPU Delivers Up To 250 Teraflops of AI Compute

Related Story Microsoft Partners With Graphcore to Power AI Applications in Azure Cloud

The IPU-Machine - A 1 PetaFlop Rack With Four MK200 IPUs

Further Reading

Trending Stories

Popular Discussions