After NVIDIA launched its Rubin AI GPUs last month, we decided to interview Larry Yang, the chief product officer at Phononic. We were wondering about the new chips' cooling requirements given that energy constraints are closely related to AI rollout.
Larry is an industry veteran with more than 30 years of experience under his belt. He has previously worked at Google, IBM, Microsoft and Cisco. Our conversation revolved around the cooling requirements for NVIDIA's and other AI chips. It also covered AI ASICs, commonly known as custom AI processors.
Larry outlined that high bandwidth memory (HBM) chips are one reason why AI chips require significant cooling. He also explained how data centers are cooled and the role Phononic's unique thermoelectric coolers (TECs) can allow AI companies to rely on fewer GPUs by extracting more performance per GPU through greater efficiency.

So as a warmup question, can you explain the different cooling technologies in the industry and how Phononic’s products compare to them?
Yeah. I think traditionally, cooling has been air-cooled. Basically, you're just blowing air across a heat sink. A heat sink is usually a material, could be aluminum or copper that is moving heat away from a hotspot. A processor, for example, you see them with multiple fins. So you're just trying to create a lot of surface area. And then the air travels across those fins, absorbs the heat, and then the air leaves the back of the server. As heat, and heat density has increased, the move has been to move to a material that has higher heat capacity.
So air has a particular capacity. I don't remember what the exact number is. But liquid has one that is much more, holds much more heat and can move it away much more efficiently. So there's been a big move to liquid cooling. Liquid cooling has been around, for 40 years now I think. It first made its debut in the early IBM mainframes and then it sort of stayed as a specialty supercomputing application.
And then as the heat densities of AI processors have taken off, people have started pulling that technology back off the shelf. And you're starting to see it becoming a dominant form of cooling now for AI processors.
Awesome. So for my second question, since the AI wave started in late 2022, what have you observed in the cooling industry with respect to technology, the scale, and other requirements?
Yeah, I think the biggest thing I've seen is, I'll call it an explosion in innovation. So you'll see a lot of people talking about the problem and the trends. Just the other day, Victor Ma, engineer at NVIDIA, posted on LinkedIn about how even liquid cooling has its limits. And he goes into some technical detail about why that is. You've also seen crazy innovation ideas. You've seen Microsoft try to sink data centers underwater. There are data centers in either, I think it's Finland or Sweden, that are buried underground. Where the ground has a much more stable operating temperature. There's a couple of startups that are launching data centers into space to take advantage of the coldness of space. So clearly this is a big problem and there's a, there's a lot of innovation going on. And I think our insight into this is that mechanical cooling, blowing fans and pumping liquid is very slow. As a result, data centers are over provisioned. They will set the cooling threshold at the worst case. And then just forget about it. Set it and forget.
Um, I was at the Yoda conference last week and one of the panel discussions had a gentleman, Peter Panfil, a VP at Vertiv. Vertiv is one of the leading providers of data center cooling. And he made the statement that data centers are just too cool. They're being overcooled. And that's because of this set and forget, um, uh, mentality. So our response to that is, don't use these mechanical mechanisms. Use solid-state.
So our technology is solid-state. There's no moving parts. Millisecond response rates is small. So don't just over cool the whole data center just cause you have a couple of hotspots. Use our technology in those specific hotspots and just turn them on when you need it. Now you can bring that overall cooling of the data center down and only use energy where you need it.

Okay. Awesome. So my third question is after Hopper, NVIDIA launched the Blackwell GPUs. So how did Phononic cater to any specific requirements of the Blackwell GPUs as compared to the Hopper and did the firm experience or the industry experience any shift in demand for the type of the cooling situations demanded once Blackwell was released?
Yeah, I think there's, uh, two trends or two reactions are common. One is the industry and one is for us. So definitely with Blackwell you see a lot more energy and enthusiasm around liquid cooling. Because the heat densities have just gotten so [inaudible]. The Blackwell B200 NVL72 rack is on the order of like a 100, 120 kilowatts. That's the equivalent of a 100 Weber grills packed into a small space, space of a phone booth. And then the Rubin Ultra is going to be 600 kilowatts of power required. That's equivalent of 80 Weber grills packed into that same space.
I don't even know if you can actually pack that many into that space, but it's a heck of a barbecue party that you can be throwing. And they talk about the power consumption, right? So every watt of electricity that goes into a rack becomes a watt of heat that has to get out. And so whenever they talk about these 600 kilowatts, and many people talk about the challenge of getting the electricity in, which is fair, but then you also have to talk about getting the, getting the heat out.
So, so back to your question, I think with that launch of Blackwell, I think there's a lot of momentum around liquid cooling. Our reaction to that, and a few others in the industry have also observed that the specific hotspot on a Blackwell is the high bandwidth memory chip that sits alongside the GPU. I don't know if you're familiar with high bandwidth memory. I can talk through a little bit of that. If that, if that'll help.

Yeah. So basically the idea is these, in these large AI workloads, data is the bottleneck. You need to move data into the processor core. It does its matrix multiply and it says math, and then it writes the data back out and then it reads it in the next tranche of data. So moving data in and out of the processor is a bottleneck. And so what NVIDIA and AMD and others [did], it actually started maybe eight plus years ago is they actually put the DRAM memory chips literally right alongside the GPU and they actually have stacks of DRAM die eight or 12 die that are, and then eight of those stacks are surrounding the GPU processing unit. And now they're in there and they have a very wide bus and they're very close so they can move the data very quickly in and out without having to leave the chip package. But what's happened is now you've put eight very hot things inside a package that was already kind of hot.
And the high bandwidth memory chips is actually a stack of chips. And so you can imagine, and they have like these electrical insulators in between. So you can imagine one die on the bottom and then seven to 11 of its cousins sitting on top. Like there's a bunch of blankets. And so now that bottom die has to get its heat out the top. And that's actually the thermal constraint for a GPU is cooling the bottom die of an HBM stack. And so as a result, a lot of the high bandwidth memory chips are throttled. They actually have to run slower than they could because they otherwise would overheat the whole system. So our idea of cooling on demand, cooling where you need it, when you need it, is to actually put one of our TEC coolers on top of each of these HBMs. And so that the throttling is now eliminated and the GPU performance actually goes higher because most, like I mentioned, most AI workloads are memory bound.

Awesome. So now that NVIDIA has launched its Rubin AI GPUs, you just said that your technology is basically trying to cool the memory directly by putting the solutions on top of the memory. So are there any changes in the cooling requirements for the Rubin GPUs?
My understanding is that Rubin will continue to use the liquid cooling that we've seen today. And in fact, I believe, and we can follow up on this, Jensen has stated that the Rubin will continue with a traditional direct-to-chip liquid cooling. So I don't think that regard it will trigger a major disruption, if you will.
But I think where our technology will make a difference, is unlocking increased performance with using the existing liquid cooling infrastructure. We think that we can increase the GPU performance enough so that there's a payback period on the order of single digit months, avoiding to have to buy additional GPUs because you're actually getting higher bandwidth, higher performance of the GPUs.
So you can avoid buying additional GPUs. And that investment in installing our thermoelectric cooling hardware and software, the cost of that, the payback as I mentioned in avoiding additional GPUs is on the order of months.
Okay, so as industry interest grows for custom ASIC AI chips, do you believe that cooling requirements are one factor along with cost performance that are driving the demand for ASICs?
Yeah, definitely because we're talking a lot about GPUs and HBMs. We've had similar conversation with network switch ASICs. They've a very similar idea. They have an ASIC in the middle and then they have other chiplets around it. It could be high-speed SerDes chips for doing the fast IO that networking requires. It could be optical chiplets for the companies like Broadcom and others building co-packaged optics. And they have a very similar problem where these additional chips are the thermal limiter for their performance. And so our idea is to also introduce the point cooling that thermoelectrics provide to be able to manage those temperatures.
Great. So just to clarify, the ASICs that you discussed right now, these are AI ASICs, right? Or general-purpose ASICs that are not for AI. Which ASICs were you talking about again?
Oh, I'm talking about both. So the HBM, so if you look at NVIDIA, AMD, Google, Meta, Amazon, Microsoft, everyone's architecture for these AI accelerators, these AI ASICs, have a GPU in the middle and then high bandwidth memory around that. So our application applies to all of those, all of those. In addition to that, I was also talking about networking ASICs also being driven by AI for doing scale up and scale out across the data center. They have a similar problem.

Okay. So as, I guess my second last question, I'd like you to describe Phononix cooling solutions in detail and how they cater to, I guess, cooling NVIDIA's GPUs and other GPUs as well.
Sure. So our technology is based on the thermoelectric principle. Thermoelectric principle was discovered maybe a hundred years ago. Certain materials, when you apply an electric current, one side gets hot and the other side gets cold. Think of it as a solid state heat pump. So we take this material and we cut it up into one millimeter cubes, and then we sandwich them between ceramic cold plates. So I'm going to try to hold one up. This is one of our bigger ones, largely because I want to try to show you what we're doing here. You can kind of see it's almost like a little waffle packed down in there. So we've shipped over 30 million of these chips. Some of them are much smaller than that. We've shipped over 30 million for cooling lasers and optical transceivers in data centers. We've been doing that for over 10 years now. We have all the leading optical transceiver companies, are our customers. NVIDIA is pushing the envelope with 1.6T optical transceivers. We're the exclusive thermoelectric provider for NVIDIA's 1.6T optical transceivers.
So the way our technology works is, you integrate it into your product. So in this case with the HBMs, it sits on top of the HBM. It's actually sandwiched between the liquid cold plate that is traditionally used to cool the GPU and the HBM. But we actually install our TECs in between, and it actually provides an additional temperature gradient on top of what the liquid cold plate is providing to the HBM. And then we have a firmware that senses the temperature of the HBM and controls TEC, turns it on or off, runs it halfway or whatever, to ensure that we're not over cooling. We only turn our TECs on when needed. That ensures energy efficiency when using our TECs.

So we have a very tight control loop with local electronics. And then we have a thermal fabric, we call it, where each of our controllers, has an API that's built on a data center standard API called Redfish, that sends telemetry to our thermal fabric orchestration software. Which can also send controls down to the, to our local firmware controller to set the control, the TECs locally, to a particular mode
Say the data center orchestration software knows that a particular set of racks is about to do a very high speed performance intensive inference run. So you could put our TECs into a turbo mode. And so it really optimizes on the performance. But then another cluster might be doing a longer term training run, which maybe it's okay for it to take a little longer, and it could be a little more energy efficient. So it can put our chips into an econo mode, where you actually help manage higher power efficiency across the whole GPU. And so our whole thermal fabric allows a sort of software-defined cooling, that orchestration software already does with compute, networking and storage.
Awesome. So as my final question, as AI demand grows and firms shift to newer chip designs, including custom chips and ASICs, do you believe that the data center cooling industry will have to evolve as well? If so, how.
I think certainly all of the, so I think it will need to evolve. All of the ideas that are going on that I mentioned, including ours are still relatively new. So I think that as these get adopted, I think that demand will pull along the adoption of these different new technologies. Hard to predict in five or ten years what's going to happen. I've heard some crazy ideas, HBM memory again, it has become such a big, HBM memory cooling has become such a big problem. They're talking about putting micro channels of fluid inside the stack or sandwiched in the stack of HBM memory.
So you can imagine silicon and then some interface layer and then some liquid cooling thing in there. We've actually filed a patent on attaching our TEC, our thermoelectric material directly to silicon. We eliminate one layer, so that we can actually be one of those sandwiches. That's a pretty crazy idea. But we know we can do it because you grow all kinds of stuff on top of silicon, why not grow bismuth telluride. And so that's maybe one of the ideas that are a few years out that will get pulled along as the thermal challenges of AI continue to grow.*
*As a follow up regarding the patent to attach the TEC to the silicon, we reached out to Larry for a follow up and he commented that "Phononic is looking at a variety of ways to integrate thermoelectric technology to deliver precise cooling as data center components get hotter and hotter."
Follow Wccftech on Google to get more of our news coverage in your feeds.




