With NVIDIA having launched its Rubin AI platform last month which relies on the Vera CPUs to provide long content processing and enable million-token software coding, we decided to discuss the long term demand for AI GPUs and particularly high-performant chips in an interview with Positron AI CEO, Mitesh Agarwal.
Positron is positioning itself to offer chips that use less power but provide performance suitable for AI use cases and deployment in air-cooled data centers. The latter bit is important since NVIDIA's Rubin AI GPUs will require liquid cooling. Positron is currently working on its Atlas chips which aim to significantly reduce power consumption.
Our discussion with Mitesh revolved around the use cases for NVIDIA's Rubin chips, Positron's potential in the AI chip ecosystem, GPU scarcity, and the cost-performance tradeoffs which can open up space for new entrants in the AI chip industry. We also discussed custom AI chips, called ASICs, and what role they might play in the future of AI computing. Finally, AI energy usage, which the World Economic Forum (WEF) estimates could grow by 50% annually from 2023 to 2030 was also an important part of our conversation.

So, just as a warm up question, Positron says that Atlas is in a form factor that doesn't require expensive liquid cooling. So, how do the chips do this and why don't they require expensive liquid cooling and how does that differ from the cooling requirements of NVIDIA's Blackwell AI GPUs, for instance?
Yeah so, basically for us, the goal is to be, you know, universal deployment like you know people can deploy anywhere. So if you look at all the data centers that have built up so far, they're all air-cooled data centers. So you basically pass through cold air and the servers heat it up. It's air cooled. And air cooling is easy to set because it doesn't have, you don't require piping or pass through water or liquid cooling materials. And the data center buildouts are much cheaper to do from an air cooled perspective. So, a lot of today's existing capacity, almost 95% of existing capacity today is all air cooled.
So NVIDIA Blackwell and future generation Rubin are all liquid-cooled GPUs. So from their perspective, to deploy those you need new data center buildouts, which takes time and which are expensive. And that kind of limits your options. For Positron, by being air-cooled, you know you can deploy in an existing data center, so there is a lot of data center capacity that will need to have refreshed hardware in few years time. So we will be able to capture that.
And then second thing is, being air-cooled doesn't mean you can't do liquid cooling. You can also do liquid cooled so it actually allows a spectrum build play. So by being air cooled that really allows it. And the reason why we are air cooled is not because, oh you know, we just decided to like sacrifice some performance for air cooling.
No, we are very, very power efficient. So we are roughly, 2 to 5x more power efficient than an NVIDIA GPU for inference applications. Which is where the world is really growing. And because of that efficiency we can keep our power budget per chip to less than 200 Watts per GPU. So, to give you an example, NVIDIA GPU right now, Blackwell, is roughly 1,200 Watts per GPU. And Rubin, their next generation, is going to be 2,000 Watts (we followed up with Nitesh for his source for the 2,000 Watts figure and he directed us to a Tom's Hardware article. We also reached out to NVIDIA for comment and a spokesperson declined to comment).
At those level of power density, like you know if you wanna cool a GPU or card that is consuming 1,200 Watts or 2,000 Watts, you need liquid cooling. Air cooling, no matter how much air-cooling you can pass over it, you cannot cool a 1,200 Watt chip.
With Positron chip being only 400 Watts you can do air-cooling, you can attach liquid cooling if that's the preferred solution for you. So that's the big change there.
Okay, so, are there any significant performance sacrifices by relying on air-cooling as opposed to liquid cooling? As a follow up to your answer.
No, no, not from a chip side of things. Cause our chip was designed for inference applications and we have kind of maximized the efficiencies there. So you know, doing one or other. But if you look at it from data center perspective, Ramish, like, liquid-cooled data centers are more energy efficient. So basically they waste less heat. Which is why, you know, in the future, liquid cooling data center will come up. But, also, as I said, liquid-cooling data centers are roughly 40 to 50 percent more expensive to build than air-cooled data centers. So actually the world will continue to build both types of data center. Like where there is like power abundance, like, you know in Middle East or Southeast Asia, like you know air-cooled data centers are the norm. Even in North America all of the data centers today are air-cooled. But the ones that are being built right now are the liquid cooled ones.
So, from a chip perspective we are not sacrificing anything. But from a data center perspective you'll have to make that decision of doing air-cooled versus liquid-cooled. Also as I said liquid-cooled, although it is more efficient it adds more complexity because you have to have to figure out source of water, source of liquid, you have to have piping. You know all this copper piping going into the servers. Which is lot more complex to maintain, so maintenance capex also goes up there. So, there's a lot of this pros and cons that go into it. From a Positron approach, I'll just end with this. We are providing the option to our end users to do either/or. And what we're saying is, for air-cooled data centers you can use us, whereas they just cannot use the NVIDIA Rubin next-generation architecture. Whereas for liquid-cooled you can use even more.
Awesome. So considering that air-cooled data centers are cheaper and eventually as the AI industry grows, the, I guess the development costs or the overall usage costs will also come down. So considering these factors, is it possible, that air-cooled data centers actually become more widely used as opposed to liquid cooled data centers?
Yeah, absolutely. As long as air-cooled data centers have a chip solution that actually competes and beats NVIDIA. And that's why we exist. That's what our whole goal is to be much more efficient in inference applications which is the fastest growing application right now.
Look, NVIDIA is a general purpose chip designed for both training and inference. NVIDIA is one of the smartest companies on the planet. Obviously, they understand everything. But because they have to support training, they have to have certain components for the chip that are good at training and that drives a lot more power consumption right. So that's kind of the reason why they are consuming such power. But, as long as the air-cooled data centers have a chip solution that can actually work out very, very well. Absolutely, like they will be the cheaper tokens, and cheaper dollar per token, so basically use case. Which means that if you wanna sell it to a wider market, the cheaper it is, the more large scale adoption will be.

Okay, so shifting to the broader market for a while. How do factors such as rising energy intensity, water consumption, GPU scarcity, and Return on Investment (ROI) create demand for alternatives to NVIDIA GPUs?
I think the biggest one that you mentioned, and the biggest one that everyone looks at, is return on investment. Which is performance, like basically translates to, we track it as performance per dollar. So like for every one dollar of Capex, you're buying from NVIDIA, Positron, anyone else, you know how much performance, whether it's in tokens or image generation, or video, you're basically getting right.
So return on investment capital is very, very important. For NVIDIA GPUs today, you know, if you look at even the most optimistic scenarios, the return on invested capital is like two, two and a half percent. That's the most optimistic scenario. It's like, basically, you generate enough tokens, or you've enough use cases that, you know, by charging a rental rate for it, you can pay off your capex in two to two and a half years.
So that's the most optimistic as I said. For Positron, with our current generation Atlas system, you know we are at roughly 15 to 16 months. And with our next generation silicon that will be less than 12 months. So like a return on investment capital of like, basically, ten to twelve months, where, you know whatever capex you're buying you can get that back in. . .twelve months of time. So you're looking at roughly like a 2.5x to 3x performance per dollar. At the very least. And then it also depends on some use cases.
Some use cases that are very power hungry actually we are even better at it because our power consumption is. . .we can actually drive 5x performance per dollar there. So that's where return on investment capital for us could be as low as six months compared to like three years or two and a half years for NVIDIA GPUs right.
So that's the big thing that's driving interest into Positron from, from various cloud providers and customers of ours. In terms of the data center scarcity, this is actually a very interesting point that I want to say. If you look at companies like Cloudflare, which is publicly announced as our customer, they have this existing data centers in metropolitan cities like San Francisco, New York, Chicago, whatever, right. They can't get more power from the municipal, municipality to increase more power to those data centers. They can't change those air cooled data centers to liquid cooled.
So they need to maximize whatever they have into as many tokens as possible. And from that perspective, we actually become a very, very good solution for them, to deploy rather than NVIDIA GPUs. So I think that's where, that kind of scarcity really helps us. And finally, you probably have read many, many articles, and it's the talk of the town, how energy you know is the limiting factor or the constraining factor. And if you can derive from every Watt-hour of energy if you can derive 3x, or 4x, or 5x more tokens, obviously that's going to be a much better setup for our planet. I fundamentally believe in absolutely producing as much energy as we can because that's needed for infinite intelligence. But if you can do that infinite intelligence with 5x more power or less power, then that's like a lot better for the climate right.
So that's kind of where we, our story really resonates with a lot of customers, a lot of investors side of things.
The last thing I'll touch on, your point of GPU scarcity. I think it's interesting that you know obviously part of the reason why alternatives do have market share today is because people can't get enough GPUs because GPUs are best in class. And this is where Positron actually stands out from many alternatives. Most alternatives use the same type of underlying memory technology which is called HBM which NVIDIA uses. So they are basically suffering the same level of scarcity issue. NVIDIA is the biggest company, you can't just go and get more than NVIDIA right. So they suffer from the same scarcity.
Whereas Positron has a unique and new memory architecture that doesn't rely on the same supply chain as NVIDIA GPUs. So we basically are not constrained on the same scarcity of the supply chain So actually we have a path to grow a lot faster a lot more than many of the alternatives that are relying on the same principal constraint in the supply chain that NVDIA has.

So I was looking up the Positron Atlas. And I saw that it relies on AMD's EPYC processors. However, other, I guess big tech companies, like Amazon has its Trainium chips and Google has its Tensor chips, the TPUs. Since these companies have already started to provide them as somewhat of alternatives to NVIDIA's GPUs within their cloud infrastructure, what competitive advantages does Atlas offer that entices AI software users to Atlas chips as opposed to big tech's in-house options?
So first thing I want to clarify is, the AMD chips that we are using are the CPUs. So people use Intel, AMD, or Arm CPUs. NVIDIA also uses that, so that's not where we are competing. We're competing on AI chips or the AI accelerators. So that's the AMD GPU or NVIDIA GPUs or Positron cards. Or, to your point, Amazon Trainium or Google TPUs. So these are all AI accelerators. So that's kind of where we are competing in. And our big part of where we go to even the providers of the silicon and say that hey we can stand out against them, in terms of really the frontier of models.
When you think about kind of the latest LLMs and things like these, because of our unique memory and silicon architecture, we can provide a really, really compelling performance per dollar compared to like an NVIDIA or AMD GPU. So for example, our Positron Atlas is roughly 3.5x performance per dollar compared to an NVIDIA Hopper system.
So from that perspective, where people really care about how many tokens you can generate per dollar that you are spending, we really, really stand out. Like that's where our unique kind of advantage comes in. And that's where, we go to our customers or cloud providers. And that's kind of how we stand out.
Obviously Atlas is our first generation systems. And, you know, the main point here is that, as we launch our second generation system end of next year which is what we have announced, Asimov, we're going to just continue to get better compared to Trainium, or TPU, or AMD, or NVIDIA GPU.
NVIDIA launched its Rubin CPX chips last month. These are built specifically for inference. Does Rubin’s launch worry Positron about the chip’s performance advantages over the Atlas?
No it doesn't, because if you look at the details, the Rubin CPX that NVIDIA launched, it's primarily focused on what is called pre-fill which is like the input tokens part of the inference. Which is really good and that's really needed for that. But, ultimately, the generation, which is you know if you look at like what's the output of it, is, what's gonna drive the market forces in inference in the future and where Positron cards really stand out, with Atlas and our next generation Asimov is on that. . really provide efficiency on the output part. Like if you look at video generation, if you look at code generation, there's lot more output, right. So that's where it's really going to be the driving factor of it. So actually it doesn't bother us or you know worry us. In fact, you could say the future of the world, especially in inference is going to be specialized chips for specialized applications. And you could combine an NVIDIA Rubin CPX and Positron Asimov or Atlas and get one of the most efficient kind of inference systems out there.
It doesn't mean that we can't do what the Rubin CPX does or NVIDIA's Rubin overall architecture would have to do what we do, but in terms of like, efficiency is the name of the game in inference, inference is a revenue generating activity so you need to have your cost structure as low as possible to get maximum margins. So you could actually combine those two solutions to get a really, really effective solution. But obviously both those systems are set up to be independent and stand alone. But you could also combine them.
So it doesn't worry us. It actually validates what we tell to our investors and to our customers that trends that inference is not going to be like a one chip to rule them all like training is. It's actually going to be, not only like each chip will have its own kind of like specialized workloads. But even within those workloads, those would be kind of separate where each chip is very, very specific. So it actually really validates it and it really shows that how we can capture [market share]. Like look this market is really large like, inference market is going to be like $400 billion spend in like 2028. You know capturing even a significant percentage of it or few percentage point is a significant amount of revenue. So it actually validates the thesis that NVIDIA is thinking about how can we you know segregate out a little bit of a market segment there for inference only chips. It's kind of where we are at.

Following up on your earlier comments about how the Atlas' memory differs from NVIDIA's memory so therefore you're not constrained. Could you elaborate a little more on the memory's specifics and advantages?
Yeah, so basically, the big part for us is, in memory there's two main components. Memory bandwidth utilization and memory capacity. So with Atlas our first focus was to solve the memory bandwidth utilization. So even in the best case in the GPU world, memory bandwidth utilization is basically the speed at which the data is moving through the memory to the compute and even in NVIDIA's best case world, you're basically using forty, fifty percent of the memory bandwidth utilization that's available to you. But in Positron you can drive over 90% of the utilization. So we are effectively driving much faster utilization of the memory bandwidth which allows us to have faster throughput which allows us to have better performance basically right.
So that's what we address in the Atlas section of it. With our next generation chip, we're not only carrying it through, and driving over 90% utilization, we're also addressing the memory capacity. So memory capacity is how much data can equal in the memory. And if you look at NVIDIA Rubin, which is the next generation GPU, you know they are expected to launch it with 384 gigabytes [NVIDIA's Rubin Ultra features 288 GB pf HBM3e memory] of memory. With our next generation chip we are expected to launch 2 terabytes, which is 2,048 gigabytes, so roughly five times more memory, the capacity. So this is our fundamental differenetiator with NVIDIA and other players, is we're approaching both the memory bandwidth utilization and the memory capacity of a chip and really driving fundamentally high performance than NVIDIA where you know for application that bottleneck on that memory, we can drive really high amounts of efficiency.
And again, we are like very, grounded about the fact that you know, for applications that suffer that, like you're gonna have like, 4x, 5x, performance-per-dollar and performance-per-watt. For applications that are not bottlenecked by it, like training application, obviously NVIDIA is going to be the faster chip there, right. So that's kind of where. . .you have specificity in the market that you find. But those are the two things in the memory that we are really addressing and growing.
Talking about fabrication and fab demand, so basically, the capacity for leading edge nodes is always constrained since the demand is always higher, big tech companies, they often book fabrication capacity in advance. So how does Positron hope to get its chips into the fab and out of them and ensure there are no bottlenecks when it comes to production?
The current Atlas is a completely US made chip so you know it's with Intel Foundry, it's, you know it's a completely US based supply chain ecosystem. So we have full production, like scalability there, especially with the Intel Foundry, you know they want more business and I think it's available for scale out right.
For Azimov, obviously, you know, that becomes your question you know, both comes from the fact that proof is in the pudding. If we show big orders to the foundries, from our customers, then obviously we get prioritization in terms of it. The second thing is, you know we are going to be using a process node that's going to be much more mature and much more widely available by the time we tape out which is late 2026. And so, there should be a lot more availability, and, that same process node is right now in Taiwan but it's also coming in Arizona. So from that perspective, we are not, I think, you know, what we've been told from our fab providers obviously is that they have no concerns in providing us scalability.
The big thing is, the greatest thing for us is, if we get to a point where we are are like saying to fab that hey please give us more allocation, because that means that we are selling, you know, hundreds or thousands or millions of chips because they can easily accommodate you know hundreds of thousands of chip production for us. So like we're not really concerned around that part of thing.
And that other thing is, when you hear about scarcity today, or GPU production, it's actually not the chips or wafers. They are scarce or they are stuck on production because of CoWoS, which is the memory substrate. And our memory substrate is completely different CoWoS. We are not in the same supply chain ecosystem as NVIDIA or AMD or TPU, so it really creates, again, we are not subject to the bottlenecks that NVIDA and AMD are from the memory substrate part of it.
So, to summarize what you've just said, you don't expect to be constrained with regards to the production of the Positron Atlas, because, one, it is on a more mature node, and second, the fabs will accommodate you once you have the demand?
Yep, exactly. And with Atlas it's all US based Intel. So Intel has a lot of capacity. With our Azimov, which is the next generation, that's where, you know as I said, the more demand there is, the fabs will accommodate it. And then we are not, again fabs are not the real bottleneck here, the memory substrate is and we are using a completely different memory substrate.
What trends do you see in the future with regard to custom AI ASIC demand, and what factors might drive this demand up?
I think, two things, one to your general question around custom silicon, I think you will see that grow. I think that's market level, markets usually drive for that. When NVIDIA is driving so much of the market, you know markets want more optionality, especially from a cost structure perspective, it will happen.
So, I think generally ASICs have a very positive and bright future. And you have to think of them like, history tells us that like you know, especially in silicon. . .we had x86, we had Intel, AMD, Arm and then each of the hyperscalers are doing their own CPUs in conjunction with Arm. You look at crypto mining, you had initially all sort of GPUs but then every ASIC came for crypto mining.
So I think inference will see the same thing. In terms of what will drive ASIC adoption, I think the key, key thing is if you can drive these economic incentives. You have to do better than NVIDIA. If you're an ASIC, and you're not better than NVIDIA from a performance-per-dollar part, the use case you're claiming to be an ASIC of, then, that's not a good ASIC.
You can be like 80% or 60% of performance of NVIDIA. But obviously if you're 20% of cost of NVIDIA, then that's a fair trade and you can really drive efficiencies. And that's what I'm saying, performance-per-dollar you have to be better than NVIDIA, for the application that you're claiming to be an ASIC on.
And that's the critical thing. The market has so far not, given any third party option to really drive that. Positron is the first one really that is claiming that very, very openly and we have proven that. But if you look at what Amazon is doing with Trainium and what Google is doing with TPUs, TPUs, are definitely that example for Google especially right because if you look at their cost structure for TPUs, a lot less than what they NVIDIA. Because they don't have to pay a margin stack to NVIDIA and TPUs have a really good performance compared to GPUs. So they actually get a lot better performance-per-dollar on TPUs than GPUs, which is why they use a lot of TPUs for in-house applications.
So that's kind of the world of ASIC is you have to drive performance-per-dollar especially for the application you're claiming to be the ASIC of. And if you do that, then, the use cases are so big that people will use you because any dollar efficiency that you can get is dollar to your bottom line that you can better margins.
Follow Wccftech on Google to get more of our news coverage in your feeds.




