As the race to build out artificial intelligence accelerates, the infrastructure required to support it is undergoing a remarkable transformation. In February, Google announced a plan to spend $175 billion to $185 billion in capex for 2026 — a figure roughly equivalent to the GDP of Hungary.
In this special live episode, recorded at Transition-AI 2026 in San Francisco, Shayle sits down with Amin Vahdat, Google’s chief technologist for AI infrastructure. Amin pulls back the curtain on how the hyperscaler is rethinking everything from data center reliability and behind-the-meter power generation to real-time inference.
Shayle and Amin discuss:
- How Google’s shift from focusing on training to inference can enable more distributed, smaller-scale data center deployments
- Why Google is moving away from traditional “five nines” reliability for certain workloads in exchange for doubling compute capacity
- How on-site generation can serve as a “bridge” to manage interconnection latency
- Google’s milestone agreement with utilities for one gigawatt of demand response
- How software can co-optimize chip design, building cooling and power generation to create superefficient and flexible “AI factories”
Resources
- Catalyst: The rise of flexible data centers
- Catalyst: Will inference move to the edge?
- Catalyst: The mechanics of data center flexibility
- Open Circuit: The natural gas ‘bridge’ becomes a highway
- Open Circuit: Are investors losing faith in the AI infrastructure frenzy?
- Latitude Media: Energy Vault is expanding into infrastructure for AI
- Latitude Media: The rise of the AI infrastructure asset class
Credits: Hosted by Shayle Kann. Produced and edited by Max Savage Levenson. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor.
Catalyst is brought to you by FischTank PR, an award-winning climate and energy tech, renewables, and sustainability-focused PR firm dedicated to elevating the work of both early-stage and established companies. Learn more about their PR approach and how they can support your company’s messaging by visiting fischtankpr.com.
Catalyst is brought to you by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform, by visiting energyhub.com.
Tune into Critical Capital, a brand new podcast from Crux and Latitude Studios. Hosted by Crux CEO Alfred Johnson, Critical Capital explores the interlocking forces powering clean and critical infrastructure. Join us every other Tuesday for in-depth conversations at the intersection of energy, government, finance, and global markets. Listen here, or wherever you get podcasts.
Transcript
Shayle Kann: I’m Shayle Kann. Welcome to Catalyst Live. Thank you so much. Okay. I am here with Amin Vahdat, who’s sitting next to me here. Amin is the chief technologist for AI infrastructure at Google. Amin, welcome.
Amin Vahdat: Thank you for having me. Excited to be here.
Shayle Kann: Okay. I want to provide a little bit of context for the conversation we’re about to have here. I know this is why everybody is here in this room at this conference, but there’s a lot going on in AI infrastructure at the moment, particularly as it pertains to energy. Amin leads the infrastructure team at Google. So in the Q4 2025 earnings report, Google announced its intent to spend somewhere between 175 and $185 billion in CapEx this year. It’s not all for AI infrastructure, but let’s assume a decent portion of it is just for this purpose right now. Let me offer you some context for that number. We had a big election in Hungary this week. That number is roughly the GDP of Hungary. Numbers that are more relevant to this audience probably. We spend about 25 or $35 billion a year in CapEx on transmission, electricity transmission infrastructure in the United States.
So this is five to seven times that amount just from Google, just in one year. If you want to talk about big infrastructure projects, let’s talk about Vogel. Vogel is the notoriously expensive, extremely expensive nuclear plant that’s the first nuclear project built in the United States in decades. Vogel costs about $30 billion. So this is five or six Vogels per year. If you want to move outside energy just for one fun one, I was in San Diego last week, which happened to be when the lunar mission dropped down. So I looked up NASA. NASA’s annual budget is $25 billion. So this is seven NASAs that Amina is responsible for spending each year or at least this year on infrastructure. So with a lot of infrastructure and with great CapEx comes a lot of great questions. I have many. Let’s dive into some of them.
The first one, I mean, I guess is one of the big ones that’s been on my mind and I want your perspective on it. We clearly have been living in a world where scale of individual data centers has been a driving force. We’ve gone from, you guys were probably building tens of megawatts per data center years ago to hundreds of megawatts to now gigawatts. And I think probably everybody here appreciates that for training purposes, for model training purposes, scale is really important. This is why we’re getting these huge data centers. But for inference, I’ve heard mixed things. As we shift more into inference world, it may or may not be true that you need that level of individual scale. So in your mind, how much does scale matter when it comes to inference compute? When I say scale, I mean scale of the individual data center.
Amin Vahdat: Yeah, it’s a great question. And I think you have it spot on. I remember when Google announced its first data center in Oregon, the Dalles, this was 23, four years ago, before I was at Google, 10 megawatts and people were just stunned that a little company would go build a 10 megawatt data center. That was a big number. And actually no one else was building data centers for their own compute infrastructure at the time. And it’s just grown from there, 100 megawatts, gigawatt, et cetera. It’s a really good question in terms of the split between training and serving. And so here’s where to me gets perhaps most interesting. At the scale that we’re operating, we want the latest, greatest, most efficient, most capable training cluster essentially on an annual basis. If you look at our announcements for TPUs and videos announcements for GPUs, the latest, greatest is coming out every year.
And every year, the latest, greatest is by definition, better than the last year. Let’s pick this gigawatt number. Let’s say you buy the latest greatest and you put a gigawatt somewhere. And maybe you put a couple of these down. After a few years, one, two, probably not much more than that. Whoever is doing the training is going to want the new latest, greatest. And then they’re going to want to gigawatt somewhere else.
Now you got a gigawatt of capacity that used to be used for training. What are you going to do with it? Probably going to serve on it. And so now the question is, could you get away with lower scale? Yes, absolutely. And in fact, we have lots of smaller deployments, lots of data centers with much less than a gigawatt of capacity, 10 megawatts in certain places.
Shayle Kann: That serves equal value for inference?
Amin Vahdat: Inference in general. Now for our largest, most capable models, they are going to run on many chips. It’s not just one chip simultaneously, but you don’t strictly need a gigawatt of capacity to be able to do useful work. You probably don’t even need 100 megawatts of capacity. It gets a little bit more interesting than that because of, let’s say, co-located compute and storage and networking and everything else. In other words, it’s not just the accelerator. But no, strictly speaking, you could go to much smaller deployments and still be able to do inference. The life cycle aspect of it that I just described as people cycle workloads over the capacity is the more interesting one in terms of the footprint for serving.
Shayle Kann: So there’s two interesting pieces to that. One is, as you’re saying, just intrinsically for inference, you don’t need the same scale effect, but there is probably some minimum scale that’s viable, as you said, because you are co-locating it with other things. So you’re probably not doing 10 kilowatt deployments.
Amin Vahdat: No.
Shayle Kann: Okay. So we’re in the tens of megawatts or hundreds, but not gigawatts necessarily.
Amin Vahdat: And these racks today are trending toward hundreds of kilowatts, just this single rack.
Shayle Kann: Right, for the rack.
Amin Vahdat: For the rack with multiple chips in it, but I mean, absolutely you’re going to need some minimum scale.
Shayle Kann: Okay. And then the second interesting piece is what you said about sort of repurposing. And there, I guess it’s a question of demand, right? You put a few gigawatts for training, you move on to the next few gigawatts for training of whatever the next TPU or GPU is. But is that enough to serve the booming? I think the assumption has been, look, we’re training now, but that is going to result in the inference demand shooting upward. And so then that would imply it’s not nearly going to be enough.
Amin Vahdat: Exactly. And so this is, and I think we’re at that transition point. I mean, we said last year that we’re entering the age of inference. I think with agents exploding today, that’s well, well happening. So probably, I mean, the analogy I would use is from Google’s early days with web search, it used to be that most of the compute at Google was dedicated to building the search index. Pretty quickly, you hoped, and unfortunately turned out to be true, that most of the capacity needed to be used to serve that index. Same thing here. Most of our capacity maybe earlier on was used for building the model, but you would hope that it transitions to serving the model pretty quickly. And you’re absolutely right that we’re there. So I do think that over time also, as the efficiency and latency of these models improves, more just disparate deployments are going to be valuable.
So what I mean by that is today, each individual token that is generated by the model takes a reasonable amount of latency, so much so that actually you might not be able to tell the difference here, let’s say in San Francisco, if you’re accessing content on the East Coast, maybe even Europe sometimes, relative to San Francisco. In general, for let’s say maps or search or ads, that’s not true. The computing is efficient and the latency is sufficiently low that you will notice because of the speed of light propagation delay through of the network if you’re going to a far away site. So as these services become more interactive, as they become more efficient, and that is still going to be a journey. We’re not there today. You’re going to want to have geographic locality. That’s also going to impact reliability because again, you can think of it as a highway system. The less distance you have to go, the more likely it is that you’re going to find the capacity you need for your request.
Shayle Kann: So I guess wrapping up this piece of it, the core question that I’ve been trying to think about, I think a lot of folks in this world that intersects energy and AI have been thinking about as well, is do we end up as we shift more and more into inference where you could make an argument for smaller pixel sizes, making sense for data centers, does it end up being easier in three years, five years, something like that, to go build a new gigawatt data center and find a site on the grid that you can interconnect the gigawatt data center, or does it become easier and/or faster to build 50, 20 megawatt data centers or something like that?
Amin Vahdat: That’s a good question. In general, we found over the years that it’s easier to build a smaller number of larger sites. There’s still asterisks there. You don’t want to be too concentrated, again, from a fault tolerance and geographic locality perspective. In other words, the argument of build as big a site as you can in one place breaks down rather quickly, but having a thousand each with 0.1% of your capacity has other overheads associated with it in terms of management. So I think that it’ll really come down to geographic locality and probably a medium number of medium sized data centers, sorry, for the whatever lack of precision there, but medium number of medium sized data centers augmented with a small number of large data centers.
Shayle Kann: Right, which makes sense. Okay. So then the next question spent on my mind about the future of this infrastructure that has a lot of direct relevancy to the energy side of the equation is about reliability. Data centers historically have been, just it’s gospel, I would say. The data centers require the highest reliability, three nines or whatever the number is. And to the extent where the standard footprint of a normal data center in cloud world, pre AI, but even the early AI data centers as well has a UPS system and backup generators and all this kind of stuff just to make sure that reliability is that high. Two questions for you. One, why? Why is the reliability requirement so high? And two, is there any argument for that changing in the future? Because that reliability requirement causes so much challenge and CapEx, right? Why is it such a problem that we have lead times on gas generators, all this kind of stuff?
Shayle Kann: It is because of the reliability requirement. So is it intrinsic to something about what you’re doing or is it just a function of how the business has evolved?
Amin Vahdat: Yeah, a fantastic question. And I think that if I were to probably send one message here is, no, it is not intrinsic and we should be thinking about lower reliability power delivery overall. I’ll tell you why it has been, but I think that I’ll also get to why it has changed substantially. So for most modern software services, the compute is actually a relatively small fraction of your cost. So now it makes sense to overprovision it. You want to have 99. Actually 999%, five nines reliability for your software services. You don’t need quite that, but many of our data centers aim for four nines of minutes of downtime a year maximum, which as you said, has a large amount of cost associated with it. Now, if you think about it though, as of now, given how constrained resources are and how costly they are, a much larger fraction of your overall service cost is in the compute.
So if you went to your internal customers, if I were to go to my internal customers and said, “Would you rather have four nines of availability and half the capacity or two nines of availability and twice the capacity, which do you pick?” Very often, not always. Very often they’ll say, “Oh my gosh, give me two extra capacity.” And if I need to have 99%, a 99% sounds good, you all know the math. That’s 3.65 days of downtime a year. That’s a lot. Like we’re saying three and a half a week every year, you’re down. You don’t have the capacity, but if the other 51 and a half weeks, I get twice the capacity, many people would say, “Sign me up.”
Shayle Kann: And yet I don’t see that happening. Is it happening and I’m not seeing it?
Amin Vahdat: Without saying too much, it’s happening. I would say that’s actually the co-design there with our customers at Google has been one of our sources of significant efficiency.
Shayle Kann: Okay. So that’s a good segue then into my next question, which is behind the meter power, generation storage, whatever it might be. There are multiple reasons that one might put something behind the meter, right? And it can be for reliability purposes, that is one, but oftentimes now people are talking about bridge power and things like that. What is your view on this? If there’s an enormous amount of planned behind the meter power, is that the direction of travel? Will it be the direction of travel for an extended period of time?
Amin Vahdat: It’s a very important opportunity for us and it is one of latency, again, a different kind of latency. In other words, what is the time to delivery of capacity? What I’ll say though before going down that path is that we would actually at Google prefer grid connected capacity. Why?
Shayle Kann: I was going to say, why? Is it reliability?
Amin Vahdat: It is in the end provisioning for a given level of reliability. If you’re behind the meter, you’re going to have to do all that provisioning yourself. Now, an aspect of this that’s actually quite powerful for us, and to give an example, going back to the reliability question, in March, we actually hit a significant milestone in agreements with utilities for a gigawatt of demand response across our fleet. What does demand response mean? It means that for the utility, for the one week of the year where they have maximum demand, we’re willing to brown down. And that also goes to the availability commitment that we make to our customers. Why? Because that allows them to provision not for their worst, coldest, hottest, whatever it is, week of the year, but to then provision for the 90, whatever it is, eighth percentile. And we’ll give up that capacity in exchange for, well, in the end, more availability of power, less cost, both for us, but also for the rate payers in the region.
So now, if we have to do that, all the reliability work ourselves, rather than being able to shift capacity back and forth when we’re not using it, like let’s say that we actually have behind the meter power generation and we will behind the meter in quotes, what if we can, when we’re not using it, give it back to the utility? In general, the way we look at it is we like behind the meter if it means that we get the capacity up most quickly, but we’re always going to look to invest with the utilities to bring the transmission. Maybe it’s a year after, maybe it’s two years after, right? But the point is this gets us the capacity we need and maybe we need some bridge power in the interim, but that bridge power actually in the limit could be mobile.
Shayle Kann: Tying these two things together, one thing I haven’t fully wrapped my head around with bridge power is the reliability question. If you’re still in this world where you’re demanding, let’s say it’s not four nines, let’s say it’s two nines of liability, but you need two lines of reliability with just onsite generation for some period of time, however long that bridge is, you got to build a lot of onsite stuff, right? You end up over provisioning really heavily and then eventually you get the grid connection and now what do you do with all this stuff? So during that bridge power period, are you offering a different level of service somehow or are you actually provisioning for your two nines, whatever your ultimate reliability requirement is going to be, but from day one with onsite resources?
Amin Vahdat: It’s both. I mean, we basically, I mean, one way to look at it is that most people have trouble unless they’ve operated at scale thinking in terms of these numbers of like, what’s the difference between 99.9 and 99.5 or 99.99 in a given year. And in a given year, they might actually be identical. So some people are just going to say, “I’m going to roll the dice. I hope I get lucky.” And sometimes they will, and they actually won’t experience any issues. What I would say though is that we also look to seeing, okay, beyond some of this bridge power that we’re going to need, what are the more permanent sources? Would we use solar, wind, nuclear, other sources that will be permanent, but might not be able to get us all the way to the power capacity that we might need. And then we have to augment with whatever might be turbines, gas or something else.
Shayle Kann: Which could be mobile, as you said.
Amin Vahdat: Which could be mobile.
Shayle Kann: Yeah. I guess the question for me then is, do you feel that we’re going to end up with all this stranded onsite generation as a result of this? Are we going to end up with … Is there any world where we build excess generating capacity or are we just so far underwater now that it doesn’t matter?
Amin Vahdat: I’d love to have that problem. I’d love to have that problem. I think that one of the things that we aim for at Google, and I think you all as well, is a world of energy abundance. And I think that the world would be a better place if energy were abundant. It’s not. I’m not just talking about AI or data centers or anything. Energy is a limiter. I think we’re so far away from that world that I’d love to have the conversation. I don’t think it’s the next few years where we have too much.
Shayle Kann: Let’s talk about the different resources that you might put behind the meter, right? You mentioned you can build onsite solar or wind or whatever. You can do nuclear, you can get your generation that way, you can get your generation with gas as well, and then you can build batteries to buffer. Do these end up the ones that you are going to … The data centers you’re going to build that do have onsite infrastructure beyond just the UPS and the backup generator, do they end up looking like little microgrids and are you co-optimizing against a bunch of resources or is it generally going to be … Some of the data centers like, I don’t know, the xAI data center that got built is just a bunch of gas generators basically.
Amin Vahdat: The microgrid and the software control here is going to be absolutely key. And this is a place where I think we as a community are under invested today. So if you think about that demand response scenario I talked about, if we need to do a browndown, and it’s not going to be that the whole site goes away. It’s like, okay, maybe we need to give up 20%, 30%, 40% of our capacity. Okay, which 20%, 30%, then 40%. What’s the signal to the software? What do we drain from where? What SLOs do we shift? Do we say, “You know what? For the next week, we’re going to need to fail over 20% of requests from this location to somewhere else.” Maybe actually a whole building gets powered down for a week, maybe, or most of the building. The microgrid is going to look exactly like this microgrid, and now can you distribute the power dynamically?
Amin Vahdat: Also, by the way, in response to the workload, we talked about training versus serving. The power footprint, the two are very different.
Shayle Kann: And I assume the latency sensitivity is super different as well. Even within, as you said, even within inference, there are some things that are going to be super latency sensitive and some that very much will not.
Amin Vahdat: If you’ve got your overnight agent running, then it might be all serving, but it might be batch serving that’s not sensitive from a human loop perspective, but then others or your chat interactions or whatever, that might be very latency sensitive.
Shayle Kann: Is there an extent to which you are sort of uniquely capable of executing on this in the sense that Google’s certainly the most vertically integrated player from the TPUs through the cloud service, you have Gemini, you’re running your own workloads and so on. So if part of what is required in order to reach this future where data centers are flexible and can operate at slightly lower reliability and all those kinds of things, if part of what’s required is that you have to differentiate amongst the workloads such that some can operate as necessary at really low latency and others at higher latency, Google can kind of do all that in- house. I mean, you have customers for Gemini, so you have to serve those customers, but you have more capability than most. How do you think it disseminates out beyond Google?
Amin Vahdat: So I think that it’s a good question. It’s something that we think a lot about. In other words, what we want to do is we want to design end-to-end systems that taken together, create capabilities. This word capability is actually central to what we discuss internally a lot. So I appreciate the question, create capabilities that otherwise wouldn’t be possible. And I do think that it comes down to this vertical integration. In other words, for us, for let’s say our TPUs, we co-designed them with the building. We co-designed them with the power generation source. We co-designed them with the deep mind team that builds Gemini models. So it’s the software above, the models above that, the chip design, which we do in my team as well. That’s integrated with the rack, that’s integrated with the data center, that’s integrated with the power source. And if between each of these boundaries, you have a custom optimized interface that gets you a few percent.
Those few percents up and down start adding up, multiplying out, in fact, to something meaningful. And that is exactly what we go after.
Shayle Kann: Okay. So I’m going to ask you to rank some things. There’s been a little bit of a debate publicly that I has found interesting about what is the rate limiter on the growth of AI. Let’s assume for the moment it’s not demand. The relative to supply today, that there’s essentially infinite demand, and maybe that changes some point in the future. I’d be interested in your perspective on if and when that might happen, but certainly not the case today. So it’s going to be something else. There has been an argument that it is chips and the chip supply chain, particularly some of the things upstream in the chip supply chain, like UV tools for lithography and so on. This is a room full of power oriented people. There’s certainly also an argument that it is power. I think there’s a third argument maybe that it could be labor at some point.
You can add a fourth if you want to that, but if you had to rank order, what is the biggest rate limiter to growth between power, chips, and labor? How would you rank them?
Amin Vahdat: Yeah. And I would add data center construction and delivery; so EPC is a broad category, not just labor. Labor is one component of it, but I think just even the supply chain there associated with it, electricals, mechanicals, cooling, et cetera, is another aspect of it beyond the chip supply chain. I would say that when delivering the end to end, we unfortunately don’t have the luxury of focusing on a single limiter. I would say very sincerely and honestly, at 10:00 AM, it’s labor, at noon, it’s power and at 2:00 PM it’s chips every single day.
Shayle Kann: All right. I’m going to force you to answer the question a different way (audience laughs). You’re supposed to spend whatever it is, $175, 185 billion this year building out new infrastructure. If you woke up tomorrow and Sundar said, “You got to spend 300 now,” what would you go try to solve?
Amin Vahdat: I’m not trying to dodge the question, but I very sincerely feel that actually we’d have to go scale all of them and that every single one of those is at the limit of what we can do for the envelope that we have. Is one of them inherently easier to scale than the others among the options? Honestly, no. All three of those are major, major issues for us. I’m sure that there is an answer, but I’m not relaxed about any of them. This is a real thing. I couldn’t pick one. I would say Sundar, wow, 300. Okay, I’ll get back to you as to what the exact issues are going to be.
Shayle Kann: On the labor and EPC one, curious your perspective on not just related to data center construction, but in general, the rise of physical AI as a category, the rise of robotics, and who knows what form factor then it ends up taking has been a sort of second wave. There was an LLM wave of excitement in the public. I’m sure in your world it’s been going on longer, but I would say we had this wave of sort of digital AI excitement and now a physical AI wave as well. Do you have a heuristic in your head for how demand shapes up between those two or how infrastructure will get built relative to those two?
Amin Vahdat: Yeah, it’s a good question. I mean, I think that in terms of the digital side rather than the physical side, the demand obviously today is much, much, much larger. The architecture for the physical side is still in development. I would say the best examples of it right now are with self-driving cars. In other words, if you think about it, these self-driving cars really are robots on four wheels. And for this use case in particular, you can imagine this is actually one of the hardest use cases, safety is paramount. Safety is absolutely paramount. And what that means is that you actually give up some capability, some scale for certainty and reliability. And to my knowledge, without speaking about any of the specifics, this means actually more of the edge use cases are relevant there because the multiplexing associated with cloud is probably less desirable. In other words, if you have a blip and you’re really counting on some computation to, if you’re doing a chat and whatever your chat app is down for five seconds, fine, do something else for five seconds, you come back.
If the robot can’t get its answer in five seconds, depending on the use case, that could be catastrophic.
Shayle Kann: How much of that happens on device or in the case of Waymo in car? How much of the compute that occurs in a Waymo or in a humanoid robot in the future is going to happen inside that instantiation of the physical AI device versus getting pulled from the cloud?
Amin Vahdat: Without talking about any specific use case, I believe that a lot of it is going to have to be on device and dedicated to that use case. Not all. Again, there’s going to be different kinds of use cases. If it’s, what kind of music do I want to play for my passenger? I don’t know. Maybe that’s okay if that blips for a few seconds, but if it’s, which turn do I take now on an evasive maneuver? It seems like you want that on device.
Shayle Kann: Right. Which then makes the argument for the edge infrastructure stuff a little bit weaker. The thing, people have made the argument back to the kind of edge versus medium size, medium number versus hyperscale thing. I think that people thought the strongest argument for edge, edge, small, localized compute was things like Waymo, right? But if the really sensitive safety oriented stuff or the really latency sensitive stuff is all going to happen on device, then maybe when you pull, you can handle the latency of going to the East Coast.
Amin Vahdat: It’s a very good question. And so I think, and I would need to think through it more, but if you think about some other related use cases like factory automation, in that case, would you have an edge deployment, something that looks more like edge deployment that handles this provision for handling a hundred or a thousand or whatever it is, robots for that particular use case at the edge? Again, good question.
Shayle Kann: You might do that for cost saving reasons, right? Putting all that compute into every individual robot might be expensive.
Amin Vahdat: Putting that much compute into every one of these robot arms may be prohibitive–
Shayle Kann: For the infrastructure inside the robot.
Amin Vahdat: Yes.
Shayle Kann: Okay. I want to finish up by asking you something that I feel like I don’t hear as much talk about as you’d expect in the long term that there should be, which is CapEx and cost savings in data center infrastructure. Right now, we’re just in the world of like we need to build as much as we possibly can and it seems like speed is the only thing that matters. But in the long arc of history, one presumes ultimately the cost of that CapEx is going to be important. Where do you see the biggest opportunities? If you think out into the future, how do you turn … If you were to build the same amount of capacity in five years in megawatts as you are today, is there a world where you turn that 175, 185 billion into $100 billion? And what are the things that could get you there?
Amin Vahdat: Well, we’re looking at this all the time. In other words, this is probably one of the biggest focus areas in my team. I won’t say biggest, but it’s top three for sure. It might be biggest. So in other words, when we say we’re spending X dollars, we’re saying that if we had had to have done this work last year, we would have to have spent 1.2 X making up the number. Don’t take it as … In other words, every year we’re looking to deliver substantial efficiencies such that if we had to do it again, it would be way more efficient. This starts with software and again, lot of opportunity on the software side, but lots of opportunity on the hardware side. Let me give a very simple example. What is the ratio of power to space in your data center? In other words, if you have, let me pick a number, 100 megawatts, how big a building do you build?
And how big a building do you build for 25 year lifetime of that building? And not just one generation of TPU or GP or whatever, but maybe five or six generations of them. Now you could be conservative and build infinitely sized, whatever it is building and say, “Okay, whatever comes next, I’m going to be set.” Or maybe I have to … Now, if you think about the watts per linear foot of a disc rack versus a GPU rack, radically different, like, I don’t know, 100X between disk and GPU or What are you going to assume? So now if we could actually co-design and optimize and say, “You know what? This building is going to be a GPU building. That building is going to be a TPU building and that building is going to be a disk building.” Huge opportunity. Now, I’ve now limited my fungibility.
If I change my mind in five years time and I have a disc building and I want to put some TPUs there, there’s going to be a lot of empty space, a lot of empty space. So I think we figure these things out, not perfectly, but every year, every generation, we’re looking to drive that co-design for that optimization and managing the flexibility while optimizing the cost.
Shayle Kann: It’s interesting on the outside, I think I would have assumed that you had already basically optimized to the T for linear area density. Everybody talks so much about that density inside these data centers for a variety of reasons. Some of that is because for training, it actually, it’s a performance thing. But for cost reasons as well, I would have assumed you’re already at the maximum possible density given today’s technology. It sounds like you’re saying that hasn’t always been the case, in part because we’ve been designing data centers to be more, I don’t know, multipurpose tools.
Amin Vahdat: Exactly. And back, I would say five years ago, 10 years ago, it didn’t pay to have that hyper optimized because if you lost anything in flexibility in a world where compute wasn’t the dominant portion of your cost, if compute is not the dominant portion of your cost, you actually want to have flexibility and fungibility. When compute becomes a more dominant portion of your cost, you now actually are thinking, okay, what am I going to do for this year, next year, and the year after to make sure that I optimize it super well? And the difference between storage and compute was at most 10X. The difference between storage and accelerators are approaching 100X. So the problem just got and getting wider. Now the disks aren’t consuming any more power. Every generation of the accelerators are. So these kinds of problems, but even within an accelerator, if you look at the power footprints, and this is where the microgrids also come in, of serving versus training, radically different.
If you just look at how much power we draw from the utility or from our batteries or from whatever for one workload versus another, could be a factor of two.
Shayle Kann: Well, and the profile, those workloads are very different, right? Training sort of notoriously, very on off, spiky, and you solve for that. I don’t know if this is still true within Google’s data centers, but you solve for that by basically blank workloads to try to make it smooth.
Amin Vahdat: We don’t do this, but yes, others; some do.
Shayle Kann: Yeah. The profile of those workloads ultimately impacts what other infrastructure you need on site, what your buffer system, all the power infrastructure, all those kinds of things.
Amin Vahdat: Yes.
Shayle Kann: All right. Amin, this was very fun, very informative as I expected. Thank you so much for being here.
Amin Vahdat: Thanks for having me. This is great.
Shayle Kann: Amin Vahdat is the chief technologist for AI Infrastructure at Google. This show is a production of Latitude Media. You can head over to latitudemedia.com for links to today’s topics. This episode is produced by Max Savage Levenson, mixing and theme song by Sean Marquand. Stephen Lacey is our executive editor. I’m Shayle Kann, and this is Catalyst.


