NVIDIA GeForce RTX 4090 GPU Offers Up To 15X AI Throughput Versus Laptop CPUs, TensorRT-LLM Boosts Perf By Up To 70%

Jun 12, 2024 at 01:40pm EDT
Chinese AI Startups Are Opting For GeForce RTX 4090 GPUs As NVIDIA's H20 Accelerators Offer Poor Value 1

NVIDIA has showcased impressive numbers for its GeForce RTX 40 GPUs including the flagship RTX 4090 in AI models such as Llama & Mistral.

NVIDIA's GeForce RTX 40 GPUs Tear Apart Laptop CPUs & NPUs In New Llama & Mistral AI Benchmarks, Accelerated Further With TensorRT-LLM

NVIDIA's TensorRT-LLM acceleration for Windows has brought some spectacular performance uplifts on the Windows PC platform. We have seen some impressive gains & new features that have been added to NVIDIA's RTX "AI PC" feature set and things are getting even better with the company showcasing some huge performance figures with its flagship GeForce RTX 4090 GPU.

Related Story Datacenters Are Outstripping the Power Grid, Forcing NVIDIA and Google Into a Radical 800V DC Overhaul by Q3 2026

In a new AI-Decoded blog, NVIDIA has shared how its existing GPU lineup trumps over the entire NPU ecosystem which has only managed to reach 50 TOPS in 2024. Meanwhile, NVIDIA's RTX AI GPUs feature several 100 TOPS and go all the way up to 1321 TOPS using the GeForce RTX 4090, making it the fastest desktop AI solution for running LLMs and more. It's also the fastest gaming graphics card on the planet.

Image Source: NVIDIA

NVIDIA's GeForce RTX GPUs offer up to 24 GB of VRAM while NVIDIA RTX GPUs offer up to 48 GB of VRAM, making them quite the beasts when it comes to handling LLMs (Large Language Models) as these workloads love large amounts of video memory. NVIDIA's RTX hardware comes not only with dedicated video memory but also AI-specific acceleration through Tensor Cores (hardware) and the aforementioned TensorRT-LLM (software).

The number of generated tokens across all batch sizes on NVIDIA's GeForce RTX 4090 GPUs is very fast but it improves significantly, over 4x, when enabling TensorRT-LLM acceleration.

Image Source: Jan.Ai

NVIDIA is now sharing some new benchmarks using the open-source Jan.ai platform which has also recently integrated TensorRT-LLM into its local chatbot app. This chatbot makes use of AI models such as Llama or Mistral in an easy-to-use solution. The software provider has now offered a look into some benchmarks run on NVIDIA's GeForce RTX 40 GPUs against laptop CPUs with dedicated AI NPUs.

The NVIDIA GeForce RTX 4090 GPU offers an 8.7x improvement over the AMD Ryzen 9 8945HS CPU without TensorRT-LLM and that lead extends to 15x using the acceleration (a 70% boost over the non-TensorRT-LLM config).

You can process up to 170.63 tokens in a second versus 11.57 tokens/sec on the AMD CPU. Even with the NVIDIA GeForce RTX 4070 Laptop GPU, you get an acceleration of up to 4.45x. Even more interestingly, the company has also shared numbers using an RTX 4090 in an eGPU configuration to showcase how the performance of laptops can be further accelerated using an external GPU for AI workloads. This configuration has a performance uplift of 9.07x over the same AMD laptop CPU.

NVIDIA recently laid out the current landscape of AI computational power and shows how its GeForce RTX 40 Desktop CPUs scale from 242 TOPS at the entry level and up to 1321 TOPS at the high end. That's a 4.84x increase at the lowest end and a 26.42x increase at the very top compared to the latest 45-50 TOPS AI NPUs that we will be seeing on SOCs this year.

NVIDIA RTX 40 AI TOPS
AI TOPS
0
400
800
1200
1600
2000
2400
0
400
800
1200
1600
2000
2400
RTX 4090 (Desktop)
1321
RTX 4080 SUPER (Desktop)
836
RTX 4080 (Desktop)
780
RTX 4070 Ti SUPER (Desktop)
706
RTX 4090 (Laptop)
686
RTX 4070 Ti (Desktop)
641
RTX 4070 SUPER (Desktop)
568
RTX 4080 (Laptop)
542
RTX 4070 (Desktop)
466
RTX 4060 Ti (Desktop)
353
RTX 4070 (Laptop)
321
RTX 4060 (Desktop)
242
RTX 4060 (Laptop)
233
RTX 4050 (Laptop)
194
AMD Strix (NPU - Expected)
50
Intel Lunar Lake ( NPU - Expected)
48
Snapdragon X (NPU)
45
AMD Hawk Point (NPU)
16
Intel Meteor Lake (NPU)
11

Even laptop NVIDIA GeForce RTX 40 options such as the RTX 4050 start at 194 TOPS which is a 3.88x increase over the fastest-coming NPU while the RTX 4090 Laptop chip offers a 13.72x speedup with its 686 TOPS.

Time and time again, NVIDIA has showcased just how much ahead it is in the AI segment versus the competition and these benchmarks once again solidify that if you have a use for AI, then NVIDIA has the right hardware for you.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day