NVIDIA Ampere GPU - Next-Gen Display Engine, HDMI 2.1, RTX IO More
With each new generation of graphics cards, NVIDIA delivers a new range of display technologies. This generation is no different and we see some significant updates to not only the display engine but also the graphics interconnect. With the adoption of faster GDDR6X memory which provides higher bandwidth, faster compression, and more cache, Gaming applications can now run at higher resolutions, supporting more details on the display.
The Ampere Display Engine supports two new display technologies, HDMI 2.1 and DisplayPort 1.4a with DSC 1.2a. HDMI 2.1 allows for up to 48 Gbps of total bandwidth and allows for up to 4K 240Hz HDR and 8K 60Hz HDR.
DisplayPort 1.4a allows for up to 8K resolutions with 60Hz refresh rates and includes VESA's display stream compression 1.2 technology with visually lossless compression. You can run up to two 8K displays at 60 Hz using two cables, one for each display. In addition to that, Ampere also supports HDR processing natively with tone mapping added to the HDR pipeline.
Ampere GPUs also ships with the Fifth Generation NVDEC decoder unit that adds AV1 hardware decode support. Ampere's new NVDEC decoder has also been updated to support the decoding of MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1.
Ampere also adds the 7th Generation NVENC encoder by offering seamless hardware-accelerated encoding of up to 4K on H.264 and 8K on HEVC.
NVIDIA RTX IO - Blazing Fast Read Speeds With GPU Utilization
As storage sizes have grown, so has storage performance. Gamers are increasingly turning to SSDs to reduce game load times: while hard drives are limited to 50-100 MB/sec throughput, the latest M.2 PCIe Gen4 SSDs deliver up to 7 GB/sec. With the traditional storage model, game data is read from the hard disk, then passed from the system memory and CPU before being passed to the GPU.
Historically games have read files from the hard disk, using the CPU to decompress the game image. Developers have used lossless compression to reduce install sizes and to improve I/O performance. However, as storage performance has increased, traditional file systems and storage APIs have become a bottleneck. For example, decompressing game data from a 100 MB/sec hard drive takes only a few CPU cores, but decompressing data from a 7 GB/sec PCIe Gen4 SSD can consume more than twenty AMD Ryzen Threadripper 3960X CPU cores!
Using the traditional storage model, game decompression can consume all 24 cores on a Threadripper CPU. Modern game engines have exceeded the capability of traditional storage APIs. A new generation of I/O architecture is needed. Data transfer rates are the gray bars, CPU cores required are the black/blue blocks.
NVIDIA RTX IO is a suite of technologies that enable rapid GPU-based loading and decompression of game assets, accelerating I/O performance by up to 100x compared to hard drives and traditional storage APIs. When used with Microsoft’s new DirectStorage for Windows API, RTX IO offloads dozens of CPU cores’ worth of work to your RTX GPU, improving frame rates, enabling near-instantaneous game loading, and opening the door to a new era of large, incredibly detailed open-world games.
Object pop-in and stutter can be reduced, and high-quality textures can be streamed at incredible rates, so even if you’re speeding through a world, everything runs and looks great. In addition, with lossless compression, game download and install sizes can be reduced, allowing gamers to store more games on their SSD while also improving their performance.
NVIDIA RTX IO plugs into Microsoft’s upcoming DirectStorage API which is a next-generation storage architecture designed specifically for state-of-the-art NVMe SSD-equipped gaming PCs and the complex workloads that modern games require. Together, streamlined and parallelized APIs specifically tailored for games allow dramatically reduced IO overhead and maximize performance/bandwidth from NVMe SSDs to your RTX IO-enabled GPU.
Specifically, NVIDIA RTX IO brings GPU-based lossless decompression, allowing reads through DirectStorage to remain compressed and delivered to the GPU for decompression. This removes the load from the CPU, moving the data from storage to the GPU in a more efficient, compressed form, and improving I/O performance by a factor of two.
GeForce RTX GPUs will deliver decompression performance beyond the limits of even Gen4 SSDs, offloading potentially dozens of CPU cores’ worth of work to ensure maximum overall system performance for next-generation games. Lossless decompression is implemented with high performance compute kernels, asynchronously scheduled. This functionality leverages the DMA and copy engines of Turing and Ampere, as well as the advanced instruction set, and architecture of these GPU’s SM’s.
The advantage of this is that the enormous compute power of the GPU can be leveraged for burst or bulk loading (at level load for example) when GPU resources can be leveraged as high performance I/O processor, delivering decompression performance well beyond the limits of Gen4 NVMe. During streaming scenarios, bandwidths are a tiny fraction of the GPU capability, further leveraging the advanced asynchronous compute capabilities of Turing and Ampere. Microsoft is targeting a developer preview of DirectStorage for Windows for game developers next year, and NVIDIA Turing & Ampere gamers will be able to take advantage of RTX IO enhanced games as soon as they become available.
NVLINK For GeForce RTX 3090 And Titan Class Products Only!
NVIDIA has said farewell to their SLI (Scale Link Interface) interconnect for consumer graphics cards. They will now be using the NVLINK interconnect which has already been featured on their Turing GPUs. The reason is that SLI was simply not enough to feed higher bandwidth to Ampere GPUs.
A single x8 NVLINK channel provides 25 GB/s peak bandwidth. There are 4 x4 links on the GA102 GPU. The GA102 GPU features 50 GB/s of bandwidth in parallel and 100 GB/s bandwidth bi-directionally. Using NVLINK on high-end cards would be beneficial in high-resolution gaming but there's a reason NVIDIA still restricts users from doing 3 and 4 way SLI.
Multi-GPU still isn't optimized so you won't see many benefits unless you are running the highest-end graphics cards. That's another reason why the RTX 3080 & RTX 3070 are deprived of NVLINK connectors. The NVLINK connectors cost $79 US each and are sold separately.