NVIDIA’s Latest Patent Attempts To Solve One Of AI Computing’s Biggest Challenges

Mar 7, 2025 at 10:56am EST

NVIDIA remains comfortably ensconced at the bleeding edge of GPU-based computing, enjoying an unrivaled primacy over the entire AI sphere as a result. Yet, leadership in the tech industry requires near-constant innovation. And, NVIDIA appears to be delivering bucket-loads of it, for now at least.

Related Story MacBook Neo Racked Up More Than 10% Of RTX Spark’s Two-Year Shipment Estimates In Just Over 3 Months, Making It An Impressive Feat

To wit, NVIDIA filed for a new patent - bearing the publication number US20250078199A1 - on the 06th of March, 2025. The patent envisions discrete sections of a GPU working within local confines to store and access data, and perform computations, thereby reducing the delays that are inherent in accessing distant computational resources. Needless to say, this patent's physical manifestation would significantly speed up GPU-based computations, which should allow for exponentially more powerful AI applications.

NVIDIA's patent envisions three main components to achieve this localization:

  1. AMAP Address Mapping Unit that provides an alternate view of localized memory, allowing for the remapping of physical memory to the designated local DRAM associated with a given uGPU (micro GPU).
  2. Graphic Processing Cluster (GPC) Affinity Mask System, which would enable the allocation of a compute program to specific GPCs, confining its execution to a bound uGPU node.
  3. A GPU Resource Manager

So, how does NVIDIA's envisioned GPU localization work? Well, a given AI application can inform the CUDA driver of its intent to bind with a given uGPU node via the affinity mask. The CUDA driver then coordinates with Resource Manager to apply localized mapping. Simultaneously, the memory aligned with a given uGPU node is sub-allocated to that node. Thereafter, the CUDA driver allocates computational work to the GPCs controlled by the designated uGPU node. Also, CTA threads access memory using localized address mapping, while memory requests are confined to a given uGPU's local DRAM.

NVIDIA's envisioned architecture, as explained in its patent application, would significantly reduce memory access-related latency issues, enhance cache efficiency by eliminating redundant data storage, solve latency issues inherent in cross-die communication, and give applications a more granular control over GPU resource allocation and utilization.

This patent can function as another avenue of overcoming the limitations associated with Moore's Law, relying on localization instead of miniaturization to speed up computations.

In some respects, this approach is similar to the one employed by DeepSeek, where the Chinese AI startup was able to unlock additional capabilities of NVIDIA's older-gen GPUs to drive significant enhancement in available computational resources.

About the author: Writing is my one incontrovertible passion. Over the past six years, he has authored over 2,200 distinct articles on financial and tech-related topics, spanning nearly 1 million words. And he has been a member of Wcctech mobile team since 2025. As an alumnus of the University of Toronto, Rotman Commerce Program, I bring nuance, in-depth knowledge, and a unique perspective to every topic that I cover. When I'm not writing, I'm traveling the world, exploring hidden confectionaries and restaurants as an aspiring food connoisseur.

Follow Wccftech on Google to get more of our news coverage in your feeds.