00:04
And welcome, folks, to another episode of our video cast series Beyond the IT Headlines. In this series, we always go beyond the press releases and explain what's really happening in the world of IT and data, and more specifically, what it means for your business. We publish these biweekly and always focus on pushing the narrative beyond the headlines with different guests and provocative points of view.
00:27
I'm Sean Rosemary, and it's a pleasure to be with you this week. Today's topic: AI's power draw isn't just a data center problem. It's actually becoming a societal tension point. The scale sustainably. Leaders have to rethink efficiency, power allocation, and the AI pipeline itself.
00:46
To help us break it down today, I'm absolutely ecstatically happy to introduce Don Kerouac. Don, why don't you introduce yourself to our audience? Hey, Sean, thanks for inviting me. I'm Don Kerouac. I'm the technical lead for product sustainability at Pure Storage.
01:03
All right. So, Don, you've got a lot of experience in the space. We're gonna jump in. But look, at the end of the day, I think there's a lot of talk globally about how AI is gonna save the world, how it's gonna replace humans, how it's gonna give us the next element of productivity. And I think there's truth to all of that when
01:21
we take a long-term lens. But what we're gonna focus on today is is this actually gonna break our power grid first? And is that really where we should be concerned about, and what are some of the considerations? So, just to bring our audience up to speed, I don't think it'll be any surprise to anyone that US power consumption this year is expected to reach record highs.
01:42
But more specifically for the average consumer and the average enterprise, what comes with it is electricity price hikes. Which have seen an increase of over 267% over five years. This is specific to Virginia, where a lot of these data centers are getting built out in North America, but also relates to other places worldwide.
02:01
More concerning, though, is that power, actually the capacity of power is expected to grow by 30X. by the time we reach 2035. And from a utilization standpoint, we're looking to actually generate 326 terawatt hours of electricity demand.
02:19
And with it, 100 metric tons of carbon dioxide, which is equivalent to the growth of roughly 30 million homes, or today 22% of the US. Now, to combat this, US electric companies are going to spend over 208 billion just this year, and more than $1 trillion over the next 5 years to drive increased capacity into the power grid. So, look, big AI's tech boom is pushing hyperscale energy demand to historic highs.
02:49
And that's really what we're going to dig into today, but I just want to provide a little bit of context. All of us are using these chatbots every day, whether it be Google, whether it be Anthropic, whether it be ChatGPT. And the reality is, Google actually just announced that a typical text query in its Gemini app, so anything from a recipe to helping to rewrite a document to helping to
03:11
produce a nano banano new video, produces about 0.24 watts of electricity. A typical query. That's the same as running a microwave for 1 second. Then think about how all of these queries, or how many of all these queries there are. Well, there's roughly 2.5 billion queries to ChatGPT every day.
03:31
So if you take those numbers, and I'm relating Google to ChatGPT here. The power consumption of processing 2.5 billion text queries in a day is roughly equivalent to powering 20,000 typical American homes. So we're all kind of going gaga over this new technology, we're using it for all sorts of stuff. But, Dan, what are your high-level thoughts?
03:52
Like, ultimately, what is happening here in terms of us putting pressure that we didn't anticipate on the power grid? Yeah, Sean, there's really a lot going on, uh, in this space, you know, you cited just the chat, you know, chat queries and so forth, but, uh, I think the, the problem actually is,
04:10
is going to get even more complicated as we move forward with Anthropic, you know, or, or, um, excuse me, OpenAI, um, you know, use that, that kind of eliminates us from having to sit there and type queries and they'll just go off and do its own thing. Um, you know, you'll give it a complex task. It'll go about its business for the,
04:30
over the course of a day and may run thousands or millions of queries to get an answer for you. And I believe, um, in one study from the, uh, MIT Technology Review, um, said that those types of queries could take up to 43 times the amount of energy that, that, you know, just the simple ones that we're doing today, represent, so we really have, um, in the very near term,
04:54
um, some increasing demands just from AI. And also another part of that challenge, Sean, without going into too much detail, is the type of model that's being used, how many parameters it has, how complex it is will dictate how much energy is used, um, during the inference, you know, or when inferences are made using the,
05:14
the model. So, um, you know, as the users, as consumers, we don't have any visibility into how many, you know, parameters are in the, the Google model, the Gemini, Llama, GPT, um. you know, we just don't have any ideas, so we have no way to, to judge, am I using too much energy, am I using only a little bit?
05:34
None of that's made obvious to us, and, um, you know, there's some challenges there that I think, you know, we as a society need to solve, like, you know, we, we need more transparency, I think, in that space to answer some of these energy questions, um. So, you know, that, so you're, you're hitting on all the right points, I guess, uh, here, it's a,
05:54
um, it's a huge challenge from an information availability perspective and from a, from just a power grid perspective where our grid isn't set up to handle. The types of um energy demands that will be heading our way, you know, in the next 2 to 3 years. I love what you said there, and I want to get to the energy grid in a moment, but I just want to kind of hit on a few
06:14
points. So you talked about Agenta AI, you talked about AI agents coming in and replacing people or augmenting people or making people more productive. Way we look at it. And it's funny because humans actually consume energy as well. We call it food, and we need food, and every human that we add to the organization or every
06:30
human we add to the workforce theoretically, consumes more energy, requires more food, and we've built a whole agricultural society over the years to deal with that. Relating it a little bit more to consumer, when you go into an electronics store and you go to buy a particular appliance, there's actually a sticker on it that says, if you're gonna plug this in in your house,
06:49
Here's what you can expect to consume on a daily basis. Uh, putting this into your grid. But when we go to the chatbots today, there are no such stickers. Nobody's calling us like, are you aware that this query that you're about to run, but maybe you already know the answer to and maybe you don't,
07:06
is actually going to cost a certain amount of money or power globally. And maybe that blindness or that lack of transparency is driving adoption, which is good for these companies, but it's actually putting and making the strain. Uh, you know, much more severe over time. So you've given me a ton to think about in terms of, you know, are we, before we go into typing that query,
07:31
are we actually thinking about the fact that this query is taking electricity? This is even, even seen the time. The day that you run the query matters. Um, you know, in California during the day, their energy grid, um, the, the, excuse me, the carbon intensity of their grid in California is actually very low, you know, I think the numbers showed like almost like,
07:55
um, near zero. So, so, you know, during the day when solar is abundant, things are great. But if you're, you know, at home doing a project at night, the, the grid mix actually changes at that time of day. So even when you run your query, um, where it gets run,
08:10
um, you know, if you're asking your, your phone to run a query for you, you have no idea where it's run. Um, all those things factor into this lack of transparency that we have today, that really, um, again, presents a significant challenge, um, around energy use in AI.
08:25
Yeah, so let's shift our focus to the grid. So, you know, AI scaling path where we're building everything today assumes that we have infinite access to computing energy. And by the way, on a long-term view, we will find new energy sources and we will find new ways to kind of generate power.
08:44
But the fact is today the grid can't keep up. Electricity companies were struggling before, right? We've had rolling brownouts and even rolling blackouts in countries for years, long before AI. But this problem's gonna get worse.
08:57
And the impact actually extends down to consumers. We're talking about basic heating systems, basic cooling systems. Uh, remember, we all share the same grid, right? We're all sharing the same available access to electricity. Um, and so ultimately, what's happening here? Well, first off, utility companies are reporting major capacity constraints due to
09:17
hyperscalar growth, right? Data centers today make up more than 90% of the new power, and PJM, one of the largest distributors of this power, expects that by 2030, it's gonna make up, uh, even more of that power. And ultimately, the ratepayers are projected to pay an additional $9.3 billion in annual capacity market costs because of data center demand.
09:41
So as the data centers consume more and more of this power, it leaves less available power for consumers, which means ultimately the additional power I need to generate is going to cost me more to get access to, and that's gonna pass down to consumers. We've already started to see that residential customers in Washington saw bills increase
09:57
about $21 a month, starting in June 2025, half of which is actually attributed to capacity spikes driven by these data centers. And so, hey, Sean, just, just to add one clarification, I, I think, um, there's also, um, with new data centers coming online, the, the demands aren't necessarily going to directly impact consumers for some of the very
10:23
largest data centers because they're deploying what's called behind the meter power, meaning they're, they're providing their own power on site, not even connecting to the grid in some cases. So, I think we should kind of temper it with all new, you know, energy that's being required by data centers won't necessarily come from the existing grid. There are solutions that will allow them to
10:45
bring them online without obstructing the grid, at least in the short term to bridge to that 2030, 2035 timeframe when maybe more abundant nuclear, geothermal, and other sources are available. So it's, I just think it's important to call that out so we, so we don't. You know, mislead people into thinking that all new data centers you are going to be pulling
11:08
from the grids that that feed homes. They, they don't have to, and they don't have to be built that way. So, I mean, We've seen there are some options there, I guess that's fair. Look, we've already seen, I mean, Microsoft versus Three Mile Island. They intend to use that to power some of their data centers.
11:23
We've also seen the hyperscaler show some degree of empathy to this problem. By also saying, look, if the grid comes under undue strain, we'll take our capacity utilization down. We'll actually roll down our consumption so that this doesn't impact the grid. We've also seen things like gas plants opening up near a lot of these data centers.
11:43
In fact, I wouldn't be surprised to see a lot of industrial complexes that have perhaps been mothballed, actually purchased for data center expansion because they have close access to large pools of power. Think old refineries, sugar factories, grain elevators, a lot of these facilities that may no longer be required being consumed and brought into the mix and also small mo.
12:08
Reactors are what are also referred to as SMRs, where, you know, ultimately building these up to help handle some of that peak. So it's a bit of a give and take, but remember, Don, to my earlier point, it's not just hyperscalers. The enterprises are just getting into this business.
12:24
Enterprises are just starting to turn these POCs into production-ready processes, whether they host that on-prem in a sovereign cloud, in the public cloud. I'm not sure that all of those are going to have auxiliary access to power. True, very true. And, and you're right, I'm talking about the very largest, so for the kind of traditional or,
12:51
you know, enterprise-scale data centers, they're very much going to be relying on what's available from the local power company or the local grid. Um, so for those businesses, it will very much be, um, exactly as you said, um, they'll be competing with local, you know, households and businesses for what's available in the area.
13:12
Yeah, and by the way, it's compelling, right? I mean, if a particular large hyperscaler comes in and says we'd like to buy this block of power and is willing to prepay for the next 10 years or 15 years and make your financials look very attractive, it's compelling for government organizations. But at the end of the day, you've also got a model: what are my consumers gonna need?
13:30
What are my enterprises gonna need? What are my sovereign companies gonna need? What are my governments gonna need? And so, ultimately, let's kind of dive in, right? Let's talk about this energy curve. So the good news is, Don, we're early in this.
13:43
And what we've learned from previous generations is that when we get into these kind of eras early, whether it's e-commerce, whether it's the PC revolution, whether it's smartphones or the cloud, and now AI. We're not really efficient. We tend to use brute force to kind of solve a lot of these problems.
14:00
And so, whether that's, you know, the GPU farms, the TPU farms, whether that's the billions of queries we're doing and the way in which we're processing these queries and training and retraining and looking at inference and then refinement, um, the models we have, right? I mean, look, CAT GPT launched, I believe, in 2022.
14:21
We're now in 2025. We're already on the fifth model. And if I look at submodels, we're probably at the 25th submodel. So at the end of the day, you know, is efficiency gonna get us there? Or are you optimistic in terms of thinking that there's enough juice in this lemon that as we
14:40
squeeze it, we can actually solve this power problem through efficiencies? Yeah, I think efficiency is going to be part of a set of solutions that, that, you know, are going to need to be brought to bear to solve the problem ultimately. Um, you know, other things that I would want to call out is, you know, there's quantization being done that, that again can help a little edge analytics
15:04
moving the AI to where the data is collected, moving out to the edge, you know, can give us breathing room. Um, but I think we also need to have more thought put into kind of the, the software that runs the AI. That's where, um, I think the biggest opportunity lies.
15:22
It's not gonna simply just be, um, you know, creating faster hardware. It's gonna be software that's power aware that's, that's looking at things like what time of day am I running something. Um, you know, maybe I can only run training, um, during the day when there's, you know, there's no, um, or there's very little emissions impact.
15:44
Or maybe, you know, inferences can only be run at a certain time, or, you know, have certain budgets for how long they can be run before they need to be. Um, you know, the service needs to be turned off for a certain period of time. I, uh, you know, that's again kind of extreme, but those, those sorts of things need to be brought in.
16:02
Um, I also think, and, and a lot of folks have suggested that perhaps AI can solve its own problem, like if AI can be brought to bear on some of these challenges, um, it perhaps can come up with a solution we haven't thought of yet. So, um, I, I think it's going to be a combination of, of different approaches, um, that, that ultimately solve the problem.
16:26
AI solving for the challenges of AI. Yes, yes, I know it's, I know it's kind of a circular argument, but, um, but it has been proposed, you know, that's, that's one of the, the, uh. The ideas floating around out there because theoretically, if we were to say, we're already starting to see this with things like key value
16:45
store cache and being able to kind of cache, um, or predict where queries are going to come in. I think there are opportunities, especially with some of the repeatable industry-based tools. If you see the same query come up over and over again, first of all, you can cache it. That's great. But if it hasn't been run, you can actually predict that it will be run.
17:03
You could run it in a time when it will consume less energy. You could then cache the response to it, and then when the actual question is proposed, rather than running the entire inference, you'll be able to pull that inference from cache. So, yeah, so it's like, um, query deduplication or, you know, er, you know, like it's basically DDU, but for, you know,
17:22
the same query that's been run previously or something similar, it'd be, it'd be kind of a cool technology. I don't know if we've just invented some, some new form of duplication, but um yeah, it's, it's interesting. The good news is, you know, having been in technology for 25 years, you always see parallels between what's happened in the past and now.
17:40
And if we look at search, not every search is done holistically. Searches are cached, right? Searches are predicted. I think we're gonna see the same thing with questions. We already know that check EPTC's similar questions in probably, you know, 30-40% of all its queries are around the same topic,
17:56
and they're very sensitive to a news article or something that's going on or something that's topical. So I think you're on to something there. I like, uh. I like this AI inference, so we'll kind of, I think we can make a, yeah, well, let's just I don't, I don't think it's unique, I don't think it's unique and I have come up with, but yes,
18:18
that, that's another approach that I think. You know, based on history, um, maybe is, is another energy or efficiency, um, or way to introduce more efficiency into this, um, AI, uh, query process. Well, so let, let's round out the episode with talking about really engineering the fix.
18:36
So the good news is we got lots of smart people. We got many people working on this problem. Um, but let's talk about where this innovation is gonna come from, because I do think, you know, joking aside, there are 100 billion, trillion businesses that will emerge and solve some of these problems.
18:54
Um, you talked about some of them, but I wanna, I wanna really press on the model optimization because I think there's a lot of meat here, right? Distilling the models, de-duping some of what's going through the models using systems like KV cache. Uh, edge analytics. By the way, edge analytics reminds me a lot of, you know,
19:12
and I'm gonna age myself here, but back when I was a high school student, we had this thing called SETI, the Search for Extraterrestrial Intelligence. And essentially, long before, you know, we, we worried about, um, edge device security and all those things, we would actually give up the idle cycles of our PC to help look at pieces or parcels of, um, signal coming out of
19:34
space to see whether or not they could be interpreted. And we take that sort of work package, our computer would work it and then it would send the soul back to SETI. I'm not sure if we ever found any alien activity, but I, I think I came in a little bit later when we, when it was Bitcoin mining that worked that way
19:52
where you could do, you know, in your spare cycles you could mine coins back when, you know, it didn't take a crazy supercomputer or whatever to find because if I, if I take these smaller work packages and I use them on a more efficient device, be it an iPhone, a PC or any smart device, I actually can solve the same problem without having to use the biggest and most brute force engine to do it.
20:17
And I think you're absolutely on to it. We've seen it in Bitcoin with proof of work. I think we'll now see it with AI. You could potentially even get a check every month for the amount of work that you did for someone else, uh, providing we can solve for all the security and everything else. Or a company could use all of its assets globally, 6,000 employees,
20:35
25,000 employees, 100,000 employees when they're sleeping, the company could actually use their spare capacity of the assets they own. To do work relevant to that enterprise. Um, let's talk about hardware design. So, you know, today we're architecting for brute force.
20:51
We're architecting for the fastest, baddest GPUs that we can make. But the answer I would have is, what if we architected those chips for power efficiency? Kind of like what AMD did in the device space, rather than architecting just for pure brute force. Do you think that's realistic? I do, um, I, I, one thing I want to call out, um, I heard recently,
21:13
uh, I was in Texas at the, um, Texas Advanced Computing Center, and they were talking about limits to AI or the, the GPU design and the, the fabrication process, seems like it's going to hit a dead end at 3 nanometers. Um, but, so, so again, it would limit how efficient you know, the, or how small we can shrink things, that's kind of been the free,
21:34
the free lunch, so to speak, of, of GPU and CPU design as we can keep going to smaller and smaller processes and, and doing things, but I think we're going to hit a limit there. Um, but an interesting, um, side path I've heard about is something called magnetic transistors that actually use, um, the magnetic properties. Of, um, different materials to, to, um, I guess transport things at a very small scale,
22:00
very energy efficient. So, so again, following on to what you're saying, I think we can certainly make the GPUs far more energy efficient. Um, there are competitors to Nvidia out there that actually market on having GPUs that can provide the same level of performance at a 30 to 50% less energy consumption.
22:23
Obviously, these competing GPUs don't have all the functionality, can do everything that the Nvidia GPUs do, but they can kind of match their performance in, you know, the vast majority of activities, and they can do it at a reduced energy footprint. Again. What I saw was anywhere between 34 and 50% less energy for, you know, equivalent queries.
22:48
Um, so, so hardware, you know, so hardware improvement, I guess, is definitely, there's there's plenty of the runway there to to make GPUs that are more. I love that example you used using magnetic energy as opposed to electric energy. Yeah, magnetic transistors, right, and, and so, so you're no longer, you know, you're you're able to go almost down to the atomic level where the electrons are
23:10
flowing across things, um, and do it in a very, um, energy efficient way, because right now, you know, the way we're doing it with silicon, um, you know, there's lots of lost energy, there's limits to how small we can make things. Um, so, so again, that's gonna run out and we have to come up with a more, um, you know, a way to, to get smaller, faster, and, you know,
23:31
more energy efficient and, um, again, I'm referencing, um, MIT's, um, tech, tech, um, news articles, uh, for the, the magnetic, uh, transistors that they've recently, um, started talking about, and they, they really look promising, but I'm sure it's, again, quite a ways out.
23:50
Uh, yeah, you know, I think we talked about a whole bunch of really interesting things here. I don't doubt, by the way, eventually we'll solve for 3 nanometers, but it kind of reminds me of when you bought a new PC every year and you got more and more gigahertz, and then at some point you kind of realized, hey, I don't actually need any more gigahertz.
24:05
Uh, now, of course, we build, you know, more sophisticated software over time, but then it really became, OK, how power efficient can I make these things. Now when you looked at something like an iPhone, it was, you know, how long can I have the battery life last without the phone getting too hot. Uh, or it being exposed to direct sunlight.
24:21
I think this energy piece is going to be the next phase of evolution, and I do think Nvidia will play there. I think they'll probably be some sort of power efficient GPU, and there'll be some sort of, uh, you know, performance, uh, focused GPU. But look, I want to reference our audience to a
24:36
group out there called Sustainable Grid through distributed data centers, which is really a pretty interesting proposal around shifting some of these jobs that are going to stabilize the grid. So we're talking about arbitrage, talking about moving workloads around to where there is spare energy, um, and, you know, there's all sorts of data sovereignty and other concerns in here,
24:56
but I think that's really, really where we're going to see uh, some of this move as well as, um, you know, accelerating R&D. You talked about quantum, you talked about, um, you know, uh, magnetic, uh, technologies, you know, there's also battery chemistry, there's carbon capture, um, but ultimately,
25:19
I guess what I want to end with here, Don, is, how should customers think about this? I mean, ultimately, what should they stop and what should they start as they're thinking about this problem today and how they're going to kind of make sure that energy is a sustainable or electricity is a sustainable asset for them moving forward. Um, again, I guess what customers really need to focus on,
25:44
um, again, is getting, number one, getting better transparency into how AI, you know, is actually consuming energy for their particular business application or use, like they need to make, um, you know, they need to have a metric that measures when they run a query, when they run their AI model, how much is it costing per query and then kind of, you know, optimize how they're using it based around that.
26:07
I think, I think that would be an important aspect to look at, um, They also need to, um, maybe look at the other infrastructure that surrounds these GPUs. GPUs also require a tremendous amount of CPUs. They require storage, they require network.
26:25
Those are all areas that can be optimized to use less energy, like moving away from, and again, um, not to, you know, hit on Pure too much, but um, you know, moving away from hard disks, moving to an all flash solution can dramatically reduce energy that can then,
26:44
you know, for storage, that can be redeployed or repurposed, if you will, to run the GPUs and CPUs in a data center, so there's other, you know, knobs that, that, you know, business or IT operators need to be aware of beyond just looking for, um, you know, the latest and greatest hardware that, you know,
27:03
runs the fastest queries or biggest models. Yeah, I like that. So, you know, let us, let me bring us to our closing thought. I think today we had a really good discussion about where we're at. Let's just kind of level set. We're early. And the good news is we talked about a lot of ways that we can overcome some of these
27:19
pressures. But the fact is what we've learned from the last tech eras is that we've learned to manage latency, we've learned to maximize performance, we've learned to maximize throughput. We've even learned to minimize cost through competition. But this power frontier that's ahead of us, is going to demand that we learn and understand and focus on the optimization of watts,
27:42
the optimization of power, real power efficiency. If we fail, we will collectively fail, and the grid will not be silent. It'll push back and we'll feel really real human consequences. So hopefully, this is top of mind for everyone out there. And Don, anything you want to leave the audience with in terms of, uh,
28:02
where they can find you, uh, if they want to learn more, and, um, we can close out the show from there. Sure, and yeah, for anyone, um, wanting to reach out, um, I can, I can be reached at, um, you know, pure Don at PureStorage.com or, um, via LinkedIn. I'm always glad to, um, connect with somebody there and,
28:23
and have, uh, conversations or answer questions about, um, Pure's product technology and what our sustainability, um. You know, benefits that we can bring to a data center. All right. Well, that takes us to the end of the episode. And so, for Don Kerouac, I'm Sean Rosemarin.
28:37
Thank you all for joining Beyond the High-T headlines, and we'll see you all shortly.