The mechanics of data center flexibility

Listen to the episode on:

Adding flexibility to data center loads could ease strain on the grid and reduce the need for costly new generation. And, according to one study, shaving off just a few megawatts during peak hours could also unlock unused capacity — as many as 98 gigawatts in the U.S — if those facilities reduced load by just 0.5% each year.

The problem: data centers promise near-perfect reliability, often “five nines” (99.999% uptime) in service-level agreements with customers. That leaves little room to adjust something as critical to reliability as power.

But times are changing. The data center market is reckoning with the constraints of the power grid and growing concern about pushing up electricity prices to pay for new generation. In July, the Electric Power Resource Institute’s DCFlex demonstration at an Oracle data center in Phoenix, Arizona, reduced load 25% during peak demand. And this month Google expanded its demand response through two new agreements with Michigan Power and the Tennessee Valley Authority.

So what are the actual mechanics of data center flexibility?

In this episode, Shayle talks to Varun Sivaram, founder and CEO of Emerald AI. The startup’s data center flexibility platform powered EPRI’s DCFlex demonstration. Shayle and Varun cover topics like:

What people often misunderstand about how much of their nameplate capacity data centers actually use
The distinct load profiles of training, inference, and other workloads
How data centers can pause, slow, or shift workloads in time or space to reduce demand
What it will take for flexibility solutions like Emerald AI to earn operator trust
How much flexibility data centers can realistically achieve
Varun’s long-term vision for evolving from occasional demand response to weekly or even daily load shifting

Resources

Latitude Media: Nvidia and Oracle tapped this startup to flex a Phoenix data center
Latitude Media: Google expands demand response to target machine learning workloads
Latitude Media: Can a new coalition turn data centers into grid assets?
Catalyst: The potential for flexible data centers

Credits: Hosted by Shayle Kann. Produced and edited by Daniel Woldorff. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor.

Catalyst is brought to you by Anza, a solar and energy storage development and procurement platform helping clients make optimal decisions, saving significant time, money, and reducing risk. Subscribers instantly access pricing, product, and supplier data. Learn more at go.anzarenewables.com/latitude.

Catalyst is supported by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform by visiting energyhub.com.

Catalyst is brought to you by Antenna Group, the public relations and strategic marketing agency of choice for climate and energy leaders. If you’re a startup, investor, or global corporation that’s looking to tell your climate story, demonstrate your impact, or accelerate your growth, Antenna Group’s team of industry insiders is ready to help. Learn more at antennagroup.com.

Transcript

Tag: Latitude media: covering the new frontiers of the energy transition.

Shayle Kann: I’m Shayle Kann and this is Catalyst.

Varun Sivaram: You might slow down a job, you might change the resource allocation of how many chips, for example, are instantaneously being used for a job. You might also go all the way down to the underlying silicon and you might change what we call the clock frequency of the chip to change the rate at which computations happen

Shayle Kann Coming up: What does it actually look like to make a data center flexible?

I’m Shayle Kann. I invest in early stage companies at Energy Impact Partners. Welcome. So the conventional wisdom about data centers is that from an electricity perspective, they look like totally flat load, i.e. operating 24/7/365, and without much willingness to change that. But as power increasingly becomes the choke point for more data center infrastructure development, the world is waking up to a bunch of ways in which that’s not entirely or necessarily true. First you can put generation or batteries on site to shave peak load. That’s the physical solution. But there are also digital solutions. It appears first because data centers aren’t actually operating at nameplate peak most of the time anyway, but also second, because you might actually be able to make the workloads themselves a little bit flexible. Google actually made a big announcement about doing this at their data centers just a few weeks ago.

They’ve announced that they’ve partnered with two utilities, Michigan Power and TVA, to introduce demand response via workload flexibility in their data centers. But our guest today is my old friend, Varun Sivaram, who’s also working on this problem. His company, Emerald AI, is building a software platform that is intended to make data centers flexible. As with many things in electricity, the devil is in the details and in this case, the details involve what do we mean by flexibility? How do we actually get it? What are the SLAs between the data center operators and their customers? How are the grid operators going to think about it? There are a lot of nuances to this, so let’s get into it. Here’s Varun.

Varun, welcome back

Varun Sivaram: Shayle, thanks for having me back.

Shayle Kann: All right. New topic for us to talk about here, which is what you are spending your time on these days, data center flexibility. I want to start by having you walk me through what you understand to be the way that compute translates to electricity load in AI data centers today. I think this is something that is actually commonly misunderstood. So what does the electricity load profile look like of an actual AI data center today?

Varun Sivaram: Yeah, great question. First of all, from a planning perspective, the grid has absolutely no idea what your load profile is going to look like, and that’s the way that they study you as a new AI data center load. But let’s just back up here. AI data centers nowadays as Nvidia CEO Jensen Huang calls them. AI factories fundamentally are in the business of transforming electricity into what we call tokens, which are the fundamental input or output units from AI, and they’re doing it increasingly well. So a data center will try very efficiently to take electricity and turn it into compute outputs and you’ll have losses along the way. You’ll have losses because of the load of cooling, for example, all the other non-computational loads in a data center. Historically, a data center might lose 33% of the power or use it 33% of that power for non-IT or information technology uses.

And the remaining 66 or 67% goes into actual computations Nowadays with the increasingly customized design of these AI factories and some of the amazing efforts of the hyperscalers such as Google, these numbers are falling and therefore you can get 80 or 90% of the power being turned directly into AI computations. What does that look like to the grid? Well, if you’re running a large language model training run, you might see the power use of that AI data center spike as the training run commences have brief dips as the AI training run undergoes what’s called synchronized checkpoints. So there’s this kind of very difficult to predict transient behavior that’s wildly swinging. And then after the training run concludes hours or days later, you might have a large reduction in demand. If you have an AI data center that’s fully committed to doing what’s called inference or using these AI models, you might see more smooth but still relatively unpredictable usage patterns from the grid’s perspective. So that’s one of the reasons that AI data centers appear so scary to grids today. You can’t really plan for what you expect to see, and these loads look fundamentally different from anything they’ve ever seen. They’re extraordinarily energy dense

Shayle Kann: And it’s not dissimilar from kind of everything else in electricity, which is the result is you have to plan for the peak, right? So the data center says I need, let’s invent a number 400 megawatts of capacity. I think from a grid operator perspective, you basically have to plan for 8760 hours of 400 megawatts. That is essentially what you were planning for, right?

Varun Sivaram: You’re actually planning for even worse than that Shayle, you’re right over 8760 hours, which is one single year. You want to predict or plan for a worst case scenario where the data center, let’s say as you suggested, it’s a 400 megawatt data center. That 400 megawatts shows up at the absolute worst time of the year, but you’re actually planning for even more years than that when you’re running this interconnection study to determine can this data center connect to my system? You’re saying in the next seven or 10 years, in an absolute worst case scenario, so not just 8760 but 8760 times 10 — 87,600 hours, when a transmission line goes down somewhere and it’s a record hot day and air conditioning demand is super high on that particular day, will my 400 megawatt data center request its full 400 megawatts and overload a circuit? And if so, can’t connect it today, have to upgrade the system before we do that. So that’s how data centers are studied today.

Shayle Kann: Okay, And then- But that is a different question. That’s sort of you said it right? That is how data centers are studied today. There is a separate question of how are they operated generally speaking, which does not align perfectly with how they’re studied. In other words, it is not always true that they’re operating at full 400 megawatt capacity if it’s a 400 megawatt rated data center. So what do we know about the actual operational profile from an electricity perspective, assuming you’re doing nothing clever, like the things we’re about to start talking about,

Varun Sivaram: And let me say Shayle, before you do anything clever,I actually don’t think it’s irresponsible or analytically incorrect for the grid to study these data centers in that extremely risk averse way that I just described. Because you’re right, Shayle, data centers do take sometimes years to ramp up their capacity. They’ll proceed in phases as you build out the buildings, fill the data halls with the equipment and begin to actually run the workloads that you’d like to run. And there may also be quite a bit of buffer that you leave on top, but you may only, even if you’re running an intensive training run, you may only be utilizing this data hall 75% let’s say. And so it may very well be the case that that 400 megawatt data center in the foreseeable future does not hit 400 megawatts, and yet I don’t think it’s incorrect for system operators to plan for a hyperscaler who comes to town and says, I want a 400 megawatt data center to actually use that full entitlement once it’s granted. And there are certainly examples of data centers running absolutely full tilt, large data centers running full tilt to the point where unless Shayle, as you mentioned, you do one of these clever things to intelligently control the consumption when the grid needs you to, the grid has absolutely it’s correct and justified to assume that you may use your consumption at the absolute worst time in full.

Shayle Kann: Yeah, I mean, my understanding of the basic state of affairs is right. So the grid says, okay, I’m going to plan for worst case scenario is: I need to deliver reliable service, and so I’m going to assume you need 400 megawatts all the time for 10 years. Meanwhile, the data center actually operates differently from that and data center load profiles, AI load profiles as I understand it, I mean particularly for training, but in inference as well, at least in the current iteration of inference, they’re surprisingly spiky. So loads can go up and down quite a lot. So maybe you’re pulling 400 megawatts some of the time, maybe you’re pulling 200 megawatts some of the time it’s kind of a weird load profile. But to the grid operator, it’s unpredictable, which is I guess the key point here, which is if you don’t know when that load is going to spike or not spike, then again, all you can do is operate as if it is 8760, 400 megawatts of load. And so that’s what people are starting to wake up to is like, wait a second, there is this mismatch here. Clearly there is headroom because the data center does not need to operate all the time at full capacity, but taking advantage of that requires doing some things differently because otherwise the grid operators can’t do anything different. Their hands are tied basically.

Varun Sivaram: Yeah, precisely. I think that’s really well said. And if I can just take one more moment to set the table here, Shayle earlier you said, Hey look, this isn’t dissimilar to what we see from other loads, and I think I don’t probably disagree with you fundamentally, but I do think there’s some very peculiar things about AI that are truly dissimilar. One is the extraordinary rate of growth. The power demand from data centers has more than doubled every year, the last several years. And that trend shows no sign of abating. A lot of people talk about data center efficiency and the increasing efficiency of the new generations of GPUs, these graphics processing units. Nvidia’s Blackwell is much more efficient than Hopper, which is much more efficient than the previous generation, A100s, et cetera. But that efficiency gain is currently being eaten up by the tremendous growth in computing demand.

So even as power demand is more than doubling every year, the reason it’s more than doubling is because compute demand is more than quadrupling every year, a 4x increase every year. And the second thing that’s truly dissimilar is what I mentioned earlier, the power density AI’s power density is increasing by orders of magnitude, which I don’t think any other electricity application has seen in this short span of time where we went from five kilowatt racks. A rack is a set of servers and stacked in a single cabinet. That rack might’ve used five kilowatts just a few years ago. Today I just was in a data center in Silicon Valley seeing a brand new deployment of Nvidia GB200s, the Blackwell generation, the rack is 132 kilowatts. It’s liquid cooled, and we’re headed toward one megawatt rack. So think of that, that’s two orders of magnitude increase in density. These massive data centers occupy a tiny footprint and look like small cities. So both of these trends, the exponential increase in power demand and the shrinking footprint of massive power demand are stressing grids out in ways we haven’t seen before.

Shayle Kann: Okay. So last question on the current state of affairs. Before we talk about the clever stuff, I mentioned this, but I’m curious whether you have visibility into actually what it looks like, which is, is there a meaningful distinction in terms of the current operating profile of AI data centers for a training data center versus an inference data center? Do they look different from a load profile perspective?

Varun Sivaram: Oh, absolutely. These loads do look different, right? Training loads have a very characteristic profile and inference workloads have a different characteristic. And we talked a little bit about this earlier. A training run looks like you ramp it up, it can ramp up by tens or hundreds of megawatts. It will kind of randomly, you’ll have dips in the power as checkpoints happen, it’ll ramp all the way back down when the synchronized GPUs stop with the end of the training run inference, depending on the set of use cases and the diversity of the applications can look much more smoothed out. It might in some cases look more like what you’ve seen traditional cloud computing like you’ve seen, for example, a Meta data center might have a load profile that looks like people open their phones in the morning, go to Instagram, and so you see a spike.

Similarly, today people open their phones and go to chat GPT, and so that’s a more familiar load profile. But nevertheless, you can certainly impute a different kind of workload type from the power signature today. It’s one of the things by the way, that we at Emerald AI have been training an AI model to do. However, an important distinction here is that a data center will not do a single thing for its lifetime. A massive data center, for example, may initially be configured and specified to train a large language model, and then you’ll finish training the large language model, and then you’ll do other things with those GPUs. Those same Nvidia GPUs can then be used for smaller research training workloads that can be used for inference and fine tuning large models for specific applications. A single data center may be used for one model and then it’s separated out into multiple different types of workloads. So I wouldn’t count on any given data center having the same load profile for its lifetime or even more than a year,

Shayle Kann: Which presumably complicates things even a little bit further from the electricity perspective. Alright, so let’s talk about the clever stuff then, or at least start to talk about the clever stuff. So the key concept here is can we make data centers look to the grid like flexible assets, which means introducing some measure of predictability and planning into when the load from the data center is below peak basically. And there are various ways you could do that from basic demand response that says we will tone down demand a few hours a year just at peak to daily flexibility where you’re shifting intraday all the time. So there’s lots of different versions of it. But from a simple mechanical perspective, just to start, say you want to introduce some measure of load flexibility into a AI data center, what are you actually doing?

Varun Sivaram: So you can achieve flexibility through multiple routes. You can of course achieve flexibility through what I’ll call the physical infrastructure route. If you have a lot of backup generation, you might fire up the backup generation. Often you’re not allowed to because your diesel generator will violate its air permit if you use it regularly. And so what we at Emerald AI, the company I founded to solve this problem of data center flexibility, what we do at Emerald AI is computational workload orchestration. We want to attack the beating heart of AI’s energy demand, which as I mentioned increasingly is just the computers. As AI factories become much more efficient and honed at converting electricity into tokens. And to do that, to achieve that on-demand flexibility, you take advantage of some of the inherent or latent flexibility that the different AI workloads have. You might for example, orchestrate a workload that is flexible in time, one that can be slowed down or paused for a certain amount of time.

Something for example that looks like a fine tuning operation that doesn’t need to terminate immediately on time. If what you’re doing is taking a large language model and tuning it to a particular enterprise application, that enterprise might not mind if that model is paused for a minute or an hour. And in other cases, you may be taking a model or an AI use case that has flexibility spatially, you might move it from one location to another to save power in one particular data center location while keeping that application running as you move it to a different location. So there are a lot of different ways within this broad framework of achieving spatiotemporal flexibility. And what Emerald AI takes advantage of is there is inherent workload flexibility in the use cases of AI today.

Shayle Kann: Maybe let’s walk through that in a little bit more detail. So let’s focus on the temporal component, spatial component. If you have multiple data centers, you shift load from one place to another. Google’s actually been talking about doing that for years for the purpose of lower carbon. They’ve been saying one of our ways we’re going to reduce the carbon intensity of our computation is by shifting location to location. That feels to me like it is more readily available to the hyperscalers who have lots and lots of data centers probably within one region than it is to others, the temporal one in theory available to anybody. So what does it look like? So you have some workload that the data center is supposed to undertake. Is it as simple as saying, we delay this workload by a few hours, or presumably there’s more to it than that?

Varun Sivaram: It absolutely can be as simple, and let me first give credit where credit’s due. You mentioned Google. Google also by the way, has exploited its temporal flexibility. There was a paper, a post they put out a couple of years ago. A friend of mine, Varun Mehra, wrote it about moving video indexing operations to nighttime in order to reduce load during periods, as you mentioned, Shayle when that computation would be not renewables intensive or would be carbon intensive. So exactly as you said, one simple thing to do would be to simply pause a workload. However, that’s not going to work for all workloads. And the reason this is tricky and sophisticated is because there are many things you could do, many different requirements that users are going to have for you and you want to precisely meet a grid target and you want to make sure that your performance is not sort of approximate, but that you can guarantee to the grid that if they need you to achieve a particular demand reduction, you can certainly do that while respecting the constraints that the users of the AI compute put on you.

That dual optimization problem is what makes this complicated. So in addition to pausing and then resuming later on a job that can tolerate a delay, you might slow down a job, you might change the resource allocation of how many chips, for example, are instantaneously being used for a job. Some instances of this are known as auto-scaling where you scale up and down the resource allocation for particular kinds of queries. You might also go all the way down to the underlying silicon, the, for example, Nvidia chips, and you might change what we call the clock frequency of the chip to change the rate at which computations happen. And so depending on the workload type, a customer may be comfortable with that workload being slowed a little bit, slowed a lot. And there are some other technical limitations as well. And I’ll stop talking in a moment about the complexities because fractally complex, but I’ll mention for example, that different workload types can tolerate different amounts of clock frequency changes or power caps. And so you need to know something about these workloads in order to determine, hey, what’s the best set of operations that I can do to preserve what the user wants, which is great performance for their AI workload, whether it’s training a model, fine tuning a model, et cetera, and precisely what the grid needs, which is not a megawatt more than this limit that we promised to achieve for them. And that is a non-trivial problem that’s far harder than just, I’ll just pause a bunch of jobs.

Shayle Kann: Yeah, that differentiation amongst types of workloads, I think it’s sort of important here because if you think just historically pre AI wave, there was already the same problem of lots of data centers, way, way fewer, but lots of data centers that needed what looked to the grid like 24 7 load, et cetera, et cetera. And the explanation you would always get as to why those loads couldn’t or wouldn’t be flexible was, well, these are mostly hyperscaler data centers and the hyperscalers are making a commitment to their customers, the ones on whose behalf they’re doing this work that they will deliver with low latency or whatever it is. And so it’s just not worth it to them to try to shift this stuff around. They just want to deliver as quickly as they possibly can. So I can imagine there being cases here where that’s going to be true too. Certain inference workloads in particular, I can imagine there isn’t really flexibility, but then others may be training a model, certainly not as time sensitive. So how do you think about the workloads and types of compute for which this is especially well suited?

Varun Sivaram: Well, first of all, necessity is the mother of invention or changing your business model. And this is one of those cases Shayle where look, we’ve got 50 to a hundred gigawatts of latent AI demand in the pipeline. It’s just not going to get built unless you have this capability of flexibility. Tyler Norris’s viral paper, he’s an advisor to Emerald AI. I should note Tyler Norris’s viral paper said, Hey, there’s a hundred gigawatts of spare capacity lying around on grids. If we can just make data centers modestly flexible up to 200 hours a year, they’re able to reduce consumption by around 25% for around two hours on average prevent. And so if it weren’t the case that there was this extraordinary demand for energy, severe limitations and this golden ticket to get it, I don’t think we would be changing business as usual, which is the last two decades of SLAs or service level agreements is Shayle.

As you said, you simply get 24 7 uptime agreements on your power given the necessity. Now, I think there’s a range of AI customers, and we’ve talked to hundreds who are willing to tolerate small levels of changed power availability today. There are different kinds of ways that you can reserve compute capacity. You can have a guaranteed instance where you get that 99.99999% uptime guarantee. You can also have a spot instance where you can basically just get kicked out any time or preempted what Emerald AI’s spatial temporal flexibility technology offers is an almost firm guarantee. It’s a guarantee that look, 99% of the time you’re going to be left alone, but every so often up to that a hundred hours or 200 hours, there might be a mild power cap in which Emerald Conductor is going to gracefully orchestrate your workloads and you might have to face a power cap and based on what kind of workloads you’re running, we’re going to make sure to protect the performance and tolerate delays only where you’re willing to tolerate them.

Shayle Kann: So that implies then that sort of answers one of my implicit questions from earlier. So you’re focused on the 102 hundred hours a year. So this is a demand response type application. It’s not like a daily load shifting thing. This is like in periods of extreme grid stress, we will dial down your power consumption a little bit.

Varun Sivaram: To be clear, I think that’s where we enter, it’s the most pressing need of the hour, no pun intended today. But I think that the same toolkit that harnesses spatial temporal flexibility that allows you to for those a hundred or 200 hours provide this demand response is also the same capability set that would allow data centers to flex on a weekly or even daily basis one day. Again, if the prices are right, if the incentives are well calibrated. And I think Shayle, you and I both believe in a grid that is fundamentally abundant, cheap, affordable, and that’s going to require a lot of both dispatchable but also intermittent and not dispatchable energy. And I personally view data centers as a potential holy grail, if not the silver bullet, to enable a generation mix like that one that’s far more clean and one that’s far more intermittent.

So down the road you can imagine that data centers, which today are about 4% of American energy consumption, AI data centers are about five gigawatts of load grow to 12% by the end of the decade. AI data centers could be anywhere up to 50 or even more gigawatts to 25% of American load by 2035 and beyond. They suddenly become by far the biggest user of electricity in this country. And if they have this flexibility toolkit, they can be doing all of these operations, the up to a hundred hours demand response, potentially daily shifting. That’s what a truly co-organized AI infrastructure and electricity grid infrastructure massive system would look like. And I think step one is solving this a hundred to 200 hour problem and just getting data centers onto the grid and getting grids comfortable that they can perform when called upon.

Shayle Kann: So I think the big question then here is how much flexibility can you actually offer? And it’s going to vary, I understand, but I don’t think anybody’s proposing the 400 megawatt nominal data center turns to zero megawatts 200 hours out of the year, right? Because you still have HVAC load and all that kind of stuff. And my presumption is you also don’t want to, I mean you mentioned this, right? Some of the techniques that you want to employ are things like slowing down the clock speed of A GPU that doesn’t dial the load down to zero, it just dials it down some. So what do we know about how much flexibility, how much demand response capacity is realistically latent within say, a 400 megawatt data center?

Varun Sivaram: We set out to demonstrate one example of this in Phoenix, Arizona earlier this summer, and we published the results along with Nvidia and the Electric Power Research Institute to our partners Salt River project at an Oracle data center. And we said, look, let’s take a large cluster of GPUs and let’s see what we can get. Can we achieve a 25% demand reduction, which the Tyler Norris’ Duke Paper suggested would be a minimum threshold to achieve this massive amount of headroom. It’s a 25% reduction, sustain it for what the Arizona grid needed, which was a three hour demand reduction and do so with representative AI workloads. And so we worked with our partner, Jonathan Frankel, the chief AI scientist of Databricks, who specified for us, look, this is what a representative set of workloads could look like. It was surprising to me by the way, to hear that he anticipated that just 10% of the workloads on a representative Databricks cluster were non-preemptive.

In other words, they absolutely could not be paused or delayed in any way. That gives us a lot of flexibility to work with. And so we worked with them to develop four representative ensembles of workloads, so varying levels of flexibility, some which could be just delayed by a little bit or slowed down a little bit and some which could be delayed a little more using those representative workloads. We’ve published a pre-print of our academic paper on the archive showing that a 25% reduction is definitely feasible. We even have one of our runs which showed a 40% reduction, still met all of the performance requirements for this representative set of users and AI workloads. So there is I think a lot of inherent flexibility in the system. And then Shayle, you can think about layering on other interventions. You can get computational load flexibility alongside, let’s say some limited deployment of batteries and together you can get much of the data center’s consumption to go offline for a small amount of time.

Shayle Kann: When you say still met the performance requirements, is that like there’s something in the SLA, they’re giving you a representative SLA, and you’re saying, okay, I still need to meet this, or who defines what because isn’t that the key thing? Obviously you can get kind of as much as you want presuming that the performance requirements allow for it. And so a lot of this to me seems to come down to what is the SLA between the data center operator and the customer?

Varun Sivaram: You’re nailing it. This is the key central question going forward is can we define a new kind of SLA that looks almost like the previous kind of SLA, but has again, less than 1% of the time, the chance that your workloads might get power capped in the most graceful way possible. And again, in talking with hundreds of AI companies are conclusion is this is definitely doable. It is definitely possible for us to find a large set of customers who are willing to tolerate this kind of disruption, especially because first AI customers today struggle to get access to compute. You hear OpenAI, Sam Altman talking often about how GPU capacity is a limiting constraint on the expansion of open AI’s GPT five model, for example. And others say, Hey, the costs of compute because of the scarcity of compute are really the limiting factor for popularizing and democratizing AI.

And even for applications that are extremely timer latency sensitive. I recently talked to the CEO of a company that makes a very real time interactive world model. You can step into this world and the data center needs to be quite close to you in order for you to have a good experience at 30 frames per second. Even they can tolerate geoshifting a workload less than 1% of the year geoshifting some of the workloads within a 500 mile radius because it’s only going to incur less than a 50 millisecond latency penalty. That’s acceptable if what that leads to is a much larger set of GPU deployments and therefore better access to compute and maybe even cheaper access to compute. So I think yes, Shayle, the central question is can there be a new powerflex SLA that’s slightly different from today’s SLAs? And I think the answer is probably yes.

Shayle Kann: Alright, so final question for you then. The holy grail here is if you and others can convince the grid operators, you mentioned this before, right, that they can rely upon this type of flexibility as you said, perhaps in combination with physical flexibility assets well such that they know there’s a data center that has nominal 400 megawatt capacity, but actually we’re going to interconnect it at 300 megawatts or whatever. It’s what do you think it’s going to take to get that level of comfort from the grid operators? It’s been a long road to get traditional demand response there, and this is a whole other level of complexity. Now, as you said, necessity is a motherhood invention, but what’s your sense of what are you going to have to prove to get grid operators to trust it?

Varun Sivaram: That’s a really great question. To answer it, I recently was invited to speak at the Electric Power Research Institute’s summer seminar. There are a hundred utility and grid operators, CEOs in the audience, and I asked all of them for the same thing. I said, please participate alongside the AI companies in an escalating series of demonstrations approaching commercial scale. And we at Emerald AI plan to hit commercial scale early next year. We’re very excited to have whole data centers be power flexible in partnership with our collaborators such as Nvidia, which is our biggest investor, because that data, that ground truth, reliability information is what’s needed for grid operators and utilities to believe that this is actually a thing that AI far from being the scariest liability that’s getting added to grids, could actually be the most promising asset that we can add to grids. They’ve got to see it to believe it.

So we’re working with a range of partners. I mentioned the collaboration with EPRI and Oracle and Nvidia and SRP in Phoenix, but now we have upcoming demonstrations all over the United States and increasingly around the world, which I’m very excited about to showcase that data centers can be flexible and get grid operators very comfortable. One last thing I’ll mention is in order for a grid operator or utility to bank on the fact that, hey, when I call this resource, it’s actually going to perform the way I need it to. Emerald has developed something called the Emerald Simulator, which is a digital twin that imagines what would happen if we did certain orchestration operations. We move some workloads around, we paused or slowed workloads, and as we’ve submitted in our academic paper, it’s extremely accurate. And that accuracy built out over many more demonstrations is going to be critical to prove to utilities and grid operators that in fact the system is going to work the exact way you expect it to.

And if it doesn’t, in that absolute worst case, there will be some fail safe mechanism to make sure that it does work. So there’s a lot of convincing work to do, but I sometimes feel we’re pushing on an open door When I talk to the chairman of a regulatory commission, you pick your large East coast state that chairman said, I’ve got the governor knocking on my door every month and saying, what have you done for me to bring data centers to my state because I want to economically compete with all the other states, regulators, utilities, system operators are all balancing this trade-off between providing reliable and affordable electricity, but also bringing economic development and this extraordinary new source of demand, the greatest economic opportunity humanity’s ever seen to their state data center. Flexibility is a way to end the trade-off between those two halves. You can have it all at the same time. It’s the reason I’ve left everything I’ve been doing in my career and founded this company to do just this for the next decade of my life. So really excited about it.

Shayle Kann: Varun, this was fun. Thank you again for coming back.

Varun Sivaram: Really appreciate the time Shayle. Thank you so much for having me.

Shayle Kann: Varun Sivaram is the founder and CEO of Emerald AI. The show is a production of Latitude Media. You can head over to latitudemedia.com for links to today’s topics. Latitude is supported by Prelude Ventures. This episode was produced by Daniel Woldorff. Mixing and theme song by Sean Marquand. Stephen Lacey is our executive editor. I’m Shayle Kann, and this is Catalyst.