There's no denying the AI Juggernaut that NVIDIA has become over the last couple of years, their GPUs have become the preferred choice not only for HPC but also for data centers, including AI & Deep Learning ecosystems. Recently, NVIDIA announced that it is leveraging AI to design & develop GPUs that are far superior to those created by humans and it looks like the green team's flagship Hopper GPU is a testimonial to that statement which features nearly 13,000 circuit instances that are made entirely by AI.
NVIDIA's Hopper GPU, The World's Fastest AI Chip, Was Created With The Help of AI - Features Nearly 13,000 AI-Designed Circuits
In a blog published over at NVIDIA's Developer webpage, the company reiterates the benefits and how it, itself, leveraged its AI capabilities to design its greatest GPU to date, the Hopper H100. The NVIDIA GPUs are mostly designed using the state of the art EDA (Electronic Design Automation) tools but with the help of AI which utilizes the PrefixRL methodology, an optimization of Parallel Prefix Circuits using Deep Reinforcement Learning, the company can design smaller, faster and more power-efficient chips while delivering better performance.
Arithmetic circuits were once the craft of human experts, and are now designed by AI in NVIDIA GPUs. H100 chips have nearly 13,000 AI designed circuits! How is this possible? Blog https://t.co/PpKrAmV8vc + a thread 🧵👇 pic.twitter.com/3RrZl2muJ3
— Rajarshi Roy (@rjrshr) July 8, 2022
Arithmetic circuits in computer chips are constructed using a network of logic gates (like NAND, NOR, and XOR) and wires. The desirable circuit should have the following characteristics:
- Small: A lower area so that more circuits can fit on a chip.
- Fast: A lower delay to improve the performance of the chip.
- Consume less power: A lower power consumption of the chip.
NVIDIA used this methodology to design nearly 13,000 AI-assisted circuits which offer a 25% area reduction compared to the EDA tools which are as fast and functionally equivalent. But PrefixRL is mentioned to be a very computational demanding task and for the physical simulation of each GPU, it takes 256 CPUs and over 32,000 GPU hours. To eliminate this bottleneck, NVIDIA developed Raptor, an in-house distributed reinforcement learning platform that takes special advantage of NVIDIA hardware for this kind of industrial reinforcement learning.
Raptor has several features that enhance scalability and training speed such as job scheduling, custom networking, and GPU-aware data structures. In the context of PrefixRL, Raptor makes the distribution of work across a mix of CPUs, GPUs, and Spot instances possible.
Networking in this reinforcement learning application is diverse and benefits from the following.
- Raptor’s ability to switch between NCCL for point-to-point transfer to transfer model parameters directly from the learner GPU to an inference GPU.
- Redis for asynchronous and smaller messages such as rewards or statistics.
- A JIT-compiled RPC to handle high volume and low latency requests such as uploading experience data.
NVIDIA concludes that the application of AI to a real-world circuit design problem can lead to better GPU designs in the future. The full paper is published here and you can also visit the Developer blog here for more information.