NVIDIA will be revealing brand new details of its Hopper GPU & Grace CPU during the next iteration of Hot Chips (24) in the coming week. Senior engineers from the company will explain innovations in accelerated computing for modern data centers and systems for edge networking with topics that focus on Grace CPU, Hopper GPU, NVLink Switch, and the Jetson Orin module.
NVIDIA to reveal details on next-gen Hopper GPU & Grace CPU at Hot Chips 34
Hot Chips is an annual event that brings system and processor architects and allows for companies to discuss details, such as technical details or the current performance of their products. NVIDIA is planning to discuss the company's first server-based processer and the new Hopper graphics card. The NVSwitch interconnects the chip and the company's Jetson Orin system on a module or SoM.
The four presentations during the two-day event will offer an insider view of how the company's platform will achieve increased performance, efficiency, scale, and security.
NVIDIA hopes that it will be able to "demonstrate a design philosophy of innovating across the entire stack of chips, systems, and software where GPUs, CPUs, and DPUs act as peer processors." So far, the company has already created a platform that operates AI, data analytics, and high-performance computing jobs inside cloud service providers, supercomputing centers, corporate data centers, and autonomous AI systems.
Data centers demand flexible clusters of processors, graphics cards, and other accelerators transmitting massive pools of memory to produce the energy-efficient performance that today's workloads require.
Jonathon Evans, a distinguished engineer and 15-year veteran at NVIDIA, will describe the NVIDIA NVLink-C2C. It connects processors and graphics cards at 900 Gb/s with five times the energy efficiency of the existing PCIe Gen 5 standard, thanks to data transfers consuming 1.3 picojoules per bit.
NVLink-C2C combines two processors to create the NVIDIA Grace CPU with 144 Arm Neoverse cores. It's a CPU constructed to unravel the world's most significant computing concerns.
The Grace CPU uses LPDDR5X memory for maximum efficiency. The chip enables a terabyte per second of bandwidth in its memory while maintaining power consumption for the whole complex to 500 watts.
NVLink-C2C also connects Grace CPU and Hopper GPU chips as memory-sharing peers in the NVIDIA Grace Hopper Superchip, delivering maximum acceleration for performance-hungry jobs such as AI training.
Anyone can build custom chiplets using NVLink-C2C to coherently connect to NVIDIA GPUs, CPUs, DPUs, and SoCs, expanding this new class of integrated products. The interconnect will support AMBA CHI and CXL protocols used by Arm and x86 processors.
The NVIDIA NVSwitch merges numerous servers into a single AI supercomputer using NVLink, interconnects running at 900 gigabytes per second, and above seven times the bandwidth of PCIe 5.0.
NVSwitch lets users link 32 NVIDIA DGX H100 systems into an AI supercomputer that delivers an exaflop of peak AI performance.
Alexander Ishii and Ryan Wells, two of NVIDIA's veteran engineers, explain how the switch lets users build systems with up to 256 GPUs to tackle demanding workloads like training AI models with more than 1 trillion parameters.
The switch includes engines that speed data transfers using the NVIDIA Scalable Hierarchical Aggregation Reduction Protocol. SHARP is an in-network computing capability that debuted on NVIDIA Quantum InfiniBand networks. It can double data throughput on communications-intensive AI applications.
Jack Choquette, a distinguished senior engineer with 14 years at the company, will provide a detailed tour of the NVIDIA H100 Tensor Core GPU, aka Hopper.
Using the new interconnects to scale to unparalleled heights fills many cutting-edge features that boost the accelerator's performance, efficiency and protection.
Hopper's new Transformer Engine and upgraded Tensor Cores deliver a 30x speedup compared to the prior generation on AI inference with the world's most significant neural network models. And it employs the world's first HBM3 memory system to deliver a whopping three terabytes of memory bandwidth, NVIDIA's most significant generational increase ever.
Among other new features:
- Hopper adds virtualization support for multi-tenant, multi-user configurations.
- New DPX instructions speed recurring loops for fine mapping, DNA, and protein-analysis applications.
- Hopper packs support for enhanced security with confidential computing.
Choquette, one of the lead chip designers on the Nintendo64 console early in his career, will also describe parallel computing techniques underlying some of Hopper's advances.
Michael Ditty, an architecture manager with a 17-year tenure at the company, will provide new performance specs for NVIDIA Jetson AGX Orin, an edge AI, robotics, and advanced autonomous machines engine.
The NVIDIA Jetson AGX Origin integrates 12 Arm Cortex-A78 cores and an NVIDIA Ampere architecture GPU to deliver up to 275 trillion operations per second on AI inference jobs.
The latest production module packs up to 32 gigabytes of memory and is part of a compatible family that scales down to pocket-sized 5W Jetson Nano developer kits.
All the new chips support the NVIDIA software stack that accelerates more than 700 applications and is used by 2.5 million developers.
Based on the CUDA programming model, it includes dozens of NVIDIA SDKs for vertical markets like automotive (DRIVE) and healthcare (Clara), as well as technologies such as recommendation systems (Merlin) and conversational AI (Riva).
The NVIDIA AI platform is available from every primary cloud service and system maker.
News Source: NVIDIA