NVIDIA Improves Path Tracing Performance By 3x With Enhanced ReSTIR Algorithms, Prepped For Next-Gen Gaming

Apr 20, 2026 at 09:30am EDT
NVIDIA Improves Path Tracing Performance By 3x With Enhanced ReSTIR Algorithms, Prepped For Next-Gen Gaming 1

NVIDIA has shared a new and improved ReSTIR algorithm, which improves Path Tracing performance by 2-3x, setting the stage for next-gen gaming.

Ray Tracing Is Cool, But Path Tracing Is Cooler & NVIDIA Is Making PT Faster by 3x With Its New ReSTIR Algorithms

PC games are rapidly adopting Path Tracing as a means to deliver next-generation visual fidelity. Just like Ray Tracing, NVIDIA is the one who has paved the way for Path Tracing on PCs first. However, just like Ray Tracing in its early days, Path Tracing faces a challenge, and that's the requirement of faster hardware. As we have seen with several PT titles, even cards such as the mighty RTX 5090 only manage 30-40 FPS and require a huge supplement of DLSS upscaling and frame-gen to deliver a playable framerate.

Related Story MacBook Neo Racked Up More Than 10% Of RTX Spark’s Two-Year Shipment Estimates In Just Over 3 Months, Making It An Impressive Feat

The same was the case with Ray Tracing, which arrived on PCs first, and now runs decently on modern-day hardware. Even consoles have started implementing RT in big ways, though the setting is bound mostly to Quality Presets, which run at 30 FPS (or 60 FPS in a few rare cases).

With that said, NVIDIA, being the pioneer of visual graphics on PCs, is now set to advance Path Tracing to the next step. In a new research paper published by NVIDIA, titled "ReSTIR PT Enhanced: Algorithmic Advances for Faster and More Robust ReSTIR Path Tracing", NVIDIA proposes a new set of ReSTIR or spatiotemporal resampling algorithms that can deliver a 2-3x boost in performance, while eliminating visual anomalies with current RT/PT methods.

NVIDIA's solution to Path Tracing is said to be near "Production Ready" and halves the spatial reuse cost. ReSTIR enhanced PT algorithms also offer improved performance and quality thanks to optimizations that unify direct and global illumination while utilizing existing techniques for color noise and disocclusion noise reduction. The full list of enhancements includes:

Table 1 shows performance of our techniques, with each row adding one new feature/optimization on top of a baseline of Lin et al.’s [2022] public source code. We first measure the speedup from our cost-reduction techniques, which provide an average 2.74× speedup across the four tested scenes. These scenes were chosen to reflect a range of geometry and material complexity. Results for individual scenes are provided in the supplemental material.

To provide further insight into the effect of our low-level GPU optimizations, we profiled Opera House using NSight Graphics. The profiler data indicate that the optimizations in Section 6.2.1–6.2.3 reduce thread divergence and improve GPU computation efficiency. Specifically:

  • SM warp occupancy increases from 22.4% → 31.1%
  • Active threads per warp increase from 15.3 → 19.9
  • Warp latency decreases from 347k → 241k cycles

All of this occurs without changing sampler behavior. Applying Russian roulette (Section 6.2.4) further improves these metrics to:

  • 34.9% occupancy
  • 20.6 active threads per warp
  • 82k cycles latency

Our method also reduces storage relative to the baseline through two changes: compressing the ReSTIR PT reservoir and unifying the reservoirs for direct and indirect lighting. Because each ReSTIR pass requires two sets of reservoirs to support temporal reuse, these changes reduce per-pixel storage from 2 × (88 + 16) bytes in the baseline implementation (which uses 16-byte reservoirs for ReSTIR DI) to 2 × 64 bytes. With a 1920×1080 render resolution, this lowers memory consumption from 431 MB to 265 MB.

GPU Optimization Results Compared to Lin et al. [2022]

Technique / StageSM Warp Occupancy (%)Active Threads per WarpWarp Latency (cycles)Speedup vs. BaselineNotes
Baseline (Lin et al. [2022])22.415.3347k1.0×Public source code baseline
Low-level GPU optimizations (Sec. 6.2.1–6.2.3)31.119.9241k2.74× (avg across 4 scenes)Reduced thread divergence, improved efficiency
+ Russian roulette (Sec. 6.2.4)34.920.682kFurther efficiency gains
+ New thresholds (Sec. 4, 5, 6)Scene-independent reconnection criteria, improves shift mapping quality
All improvements (decorrelation, noise reduction)2.30×Adds 19% cost vs. fastest version, but still faster than

It's great to see that NVIDIA is improving upon Path Tracing performance. The technology has become relevant ever since the launch of the RTX 40 and RTX 50 GPUs. But moving on, NVIDIA wants to utilize Neural Rendering techniques and AI algorithms to further fine-tune the performance of their gaming hardware to accelerate next-gen visual capabilities.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.