Video
Creating Data Harmony in an World of Exponential Data Growth: An Expert Pane

43:55 Video

Creating Data Harmony in an World of Exponential Data Growth: An Expert Pane

Take this opportunity to ask Pure Storage experts how they’ve resolved some of their biggest database challenges, ranging from preparing data for AI or uploading from an IOT device to egressing data from a cloud.

00:00

Good morning, everyone. We'll, we'll get started. Welcome to the session. So let me share some good news. There are no slides in this session. So we're not trying to get through 100 slides in 45 minutes. So it's gonna be an open conversation.

00:17

Well, you got a break this time. Uh They got a break too. Um So it's gonna be an open conversation, but we hope to make it very interactive uh focused on database storage. Any, any burning questions that you have, uh Please bring them up.

00:34

And um what I'm gonna do next is I'm su I'm here with a very distinguished panel of database experts. So I'm gonna, so I'm, I'm gonna ask them to introduce themselves uh what they do at Pure and uh their areas that they focus on. Anthony. You go first. Uh Sure everyone. I'm Anthony Nocentini.

00:56

I'm a principal field solution architect at Pure. I specialize in relational database systems. Uh kind of a deep dive specialties SQL server. I do a bunch of work in Azure and also in the Kubert Space. Thanks Anthony, uh Ryan Arsenal. I'm a principal field solution architect here at P as Well, I specialize in SAP.

01:15

My name is Andrew Silliphant. Most people call me silly nowadays because there's too many Andrews case in point. Um You'll figure out his name in a minute. Um My, my, what I do at pure is I run pure database and analytics practice. So I work with these fellows a lot. Um But more importantly, I cover open source databases as well. Um SAP is one of them even though it's not an

01:35

open source database, but it's one of my disciplines. Um So for the, for the sake of this panel, think of me as Mr Oracle in open source databases. Hi, everyone. My name is Andrew Prosky. I'm a principal field solution architect as well. I'm here at pure storage focusing on SQL server.

01:50

His name is Andrew. I think they figured it out. Awesome. Welcome guys. Um So I'll kick things off and I'm gonna ask a question uh to all of you. So this session is about data growth, right? And one of the things that we all see and we experience is this kind of exponential data growth um that's happening in every

02:14

organization and with that comes the the challenge of data gravity, right? Uh The fact that it's hard getting harder and harder to move data. So what are you guys seeing in your customer environments and what are the challenges, how are they uh you know, resolving these issues? So I'll start with you,

02:35

Anthony. That's Fine. Thank you. Never stand next to the host, right? That's the thing. So I get to work with customers a lot. Super awesome. Um I think the biggest data gravity challenges I see today are generally gonna be around backups and performance and you think

02:52

performance has that a data gravity problem. Well, we have customers that have databases that are in the hundreds of terabytes and inefficient storage or inefficient queries can impact business outcomes, right? I want to do a thing. I have a query that performance has changed, maybe a performance aggression and all of a

03:08

sudden it's pulling gigabytes of data out of the platform. And so the thing that we talk a lot to customers about is how to solve those problems on our platforms, right? In terms of efficiencies, being able to identify when those problems occur and then obviously attacking availability problems and backup problems are the things that I spend the big majority of my time

03:26

working on. How about y'all? Yeah, so II, I also talk to customers a lot and um I feel like the role of the storage admin and the role of the application admin is kind of starting to blend a little bit right where back when I was in it, I need a storage, I call my storage admin 7 to 12 days later, I'd get some storage. Um uh but that I've really seen the lines start

03:50

to blend and it's, and, and, and it's great from a peer perspective because that's what we're looking for, right? We are, we want to be simple to the point where they can blend where your, your DB A and your storage guy maybe get along for once, but also start to have overlaps in their roles. And I think that's really starting to help with the data gravity problem because the storage guy now understands what the DB A is going for

04:11

and going through and vice versa. Um And starting to use things like data mobility to be able to overcome those problems and, and move data where it needs to go. That was good. I know um I like this question because it's something I wanted to talk about for a while, but I haven't actually spoken about it.

04:29

So I'm gonna have fun with this one. All three of these fellows on stage are field facing um individuals. I am in corporate, I look at market trends, I understand what customers potentially need and then we theoretically go and build it for the last two years. All I've done is stare at market data and as everybody knows,

04:47

you got like interesting trends that have two digits in them. Um A and I might be two of them. Um um But more importantly, what's happening is you, you've got companies who are like, oh, that looks interesting is something we wanna do. And if you look at the timeline of how you get from where you currently are to actually take

05:06

advantage of inference and generative capabilities. There's light years of difference. And one of the things in the equation is data governance. Um I had the privilege of talking to accenture and Deloitte data governance leads. And they were like, do you know where the business is really growing in the last few

05:25

months? I'm like, obviously, I don't, I don't work for you and they were like data governance and I was like, great. This is obviously why you're talking to me, there might be some bias in that statement. But the reason they gave for this made infinitely good sense to me,

05:37

they've got a lot of enterprises who have so many data silos, but so much data that, that the first thing they've got to do is classify it. Most companies actually don't have extraordinarily mature data governance practices. They have tactical execution on data governance, but they don't actually have it in terms of we know exactly where this object is to bring this

05:56

back to the database question. You've got a collapse of the relational database space with no SQL database space into the analytics space and into the structured and unstructured data space, like object and file and etcetera and lovely trends like data virtualization. What this means is you've got millions of ways to use data, but no one knows what data they have.

06:15

And so the trend I keep seeing with everybody, I'm looking at and talking to and all the market data I'm seeing is like you are needing to collapse your operational data stores into bigger use cases. But there's a huge gap in the ability to do that. And so how one governs data and how one classifies data is one of the first questions

06:37

I'm seeing being asked, Mr Prosky, thanks just to echo Anthony said, um I'm basically you're a mere counterpart. So we talk to customers um running SQL server when I started in technology uh longer than I'd like to admit. Terabyte databases weren't uncommon, but they were sort of the outlier. Now, we're seeing 100 terabytes, 200 terabytes had a customer asking about it.

07:01

Was it 256 terabyte database? I mean, they wanted to put it in Cuban. So they had multiple issues, but one of the main problems they're having and what we're seeing out there is so I should classify that. I spend a lot of time talking about disaster recovery, chaos engineering. Dr testing, things like that. Um And we're seeing customers the size of the database now is dictating the recovery time

07:23

objective for their recovery point, objective and not business needs. And so that's where we come in where we can talk about, say the new application consistent snapshot back ups you can take in SQL server to bring down that recovery period when customers, well say one Welsh DB comes along and wipes out of data in the table and they have to get the data back and to get that data back very quickly.

07:46

Awesome. Thanks for the insight guys. Uh Really, really good uh good stuff. So I should mention that we have a microphone uh that we can pass around the room. Uh If you guys have any questions, stop us at any time. Again, this is gonna be an interactive uh session. Please ask questions, which is, which is a nice

08:05

way of putting that pertinent questions. All right. So uh we got a go ahead. We're gonna might come in. No, that's Carlo not Mike in terms of data growth. Um Are you seeing segmentation in the through the lens of recovery? Are you seeing segmentation in the data in

08:24

terms of strictly, I need to recover this data first, this this data second. Well, before let's just recover the whole database. Now, are you seeing any segmentation around? I only want to recover a portion of the database and then do this in the background after the business is up and running. Are you seeing any trends around that or is it

08:41

still recovery? Is, is, is looking at the database as a whole? Sure. So in, in the SQL server space, we've had the ability to do what's called a partial uh database availability restore where you could bring back a section of the database and then fold in the rest of the database over time. Um But what we're encouraging customers to do is look at things like object,

09:03

right? So SQL server post graphs uh an oracle introduced this concept data virtualization uh over the last year or two. And you have the ability to take subsets of a database and park them on object, right? So how many of y'all have databases with transactions in them from like 1997? Right?

09:20

And when you have that and you have that database that we talked about earlier, someone's gonna come along and do select star whatever, right? Rip across the whole table. And next thing you know your buffer pool is blown out. Select star from world. Yeah, exactly. Hello world.

09:33

And that can be challenging, right? Because all that stuff has to be on t your zero block. You have to back it up. I have to replicate it. I have to, it's part of my recovery objectives. What we can do now with data virtualization, I can take data on some function that I define usually time and say, you know what, just take that stuff and stick it over here on object,

09:52

right? And I don't have to change code in the application to do this, right? I can present that back uh to the application in the same way. So when someone comes along and says I need all the transactions uh from 2024 boom hot block. If I need all the transactions for all time ever it'll just read across the block and then go out to the object and get the rest.

10:10

Right? And so what happens now is what's on block is the stuff that's hot, fast, lean and mean. And I have kind of an archiving strategy now uh in that platform. And so we're able to help attack that problem that you're talking about so they can focus on what's really important to the business, bring that back and then fold in the rest over time.

10:29

Yeah. Yeah. And I think that data tearing approach is becoming more and more common, right? Because uh when I, when I was in it, the number one project to ever get shut down is an archiving project, right? Number one and the one and the one that never finished was the data warehousing project.

10:42

Yeah, that's a good point. We, we even went, we, we didn't call it archiving. We called it data relocation. And now that one didn't, didn't fly either. Um The business wants, they're like millen millennials, right? They need it now. They want their data, even if it's from 1997 they want to know that they can access it

10:57

anytime they want right now. So that's where data tearing becomes extremely important and in a, in a, in a ha a world where everything's in memory. Um and you pay for every bite that gets into the memory. Um, data tearing becomes even more important, right? But now you're starting to now, you really need to have that data analyst team that really

11:15

understands the data and your point earlier, what, what data is most critical to the business and where is it growing? And what do they actually need sub millisecond latency to access uh and be able to tear between hot, warm, cold. Um But the to the user, they don't know that. Right. They, they just see their data, um they don't

11:33

know it's coming from object, they don't know it's coming from a cold tier. Um And that's really, that's really how you get around that archiving problem because they still have access to it even though it's not there. Awesome, great, great question than thanks for the insight guys. Any uh any other questions?

11:51

Oh, yeah. What's a, I've been, I've been talking about SQL server in Docker Containers since about 2016 when I first presented at a conference in Britain called Sequel bits. I was the only guy talking about sequel in containers. And the first question I had at the end of my session was,

12:16

are you mad? Uh But containerization technology has benefits for us as data professionals that we really shouldn't ignore things like develop agility. I worked on a project where every month our Q A and DEV departments would blow away their vms and rebuild everything, reinstalling SQL, restoring all the databases and it took about 45 to 60 minutes purely for

12:46

the fact that these servers that we were running on weren't exactly the highest spec it just used to churn and install sequel. Pull everything down. We decided to get rid of that and blow all those VMS away. Rest, install everything on a remote host running docker. And it would just make call out spin up a container from a custom image depending on

13:04

which branch they were working at the time. And it would spin up their instance for them all fully configured databases scheme only but ready to go. And then they could pump their data in the longest it took, we timed this, it took around two minutes down from 45 to 60 per instance. It worked out that we saved a month's worth of

13:22

a developer time a year. So this stuff just purely from that perspective is highly worth looking at and then you can move progressively into. Ok. Well, I came in one morning and we were just running on a remote host that host went down, everything's down. No one can do any work.

13:38

And I was the most popular person in the office that day. So I had to come in and sort it all out. But that's where we start looking at H A technologies talking about orchestrators, Docker swarm. And obviously Kubert is because it's quite clearly won the orchestrate wars as they were.

13:52

So this stuff is, I think it started off as a DEV technology, but now as we're starting to see more mature technologies come out. It's starting to look like can we run our containers in production? And the answer is absolutely 100 per cent with things like port works. The data services there for migrating and backing up data persisting our data data

14:12

services as well. One click, one click deployments for sequel server like that. It's a really interesting space at the moment and I'm going to stop talking now because I can go on all day. I want to add something interesting to that. So as I said, I stared at it wasn't interesting. You have to add something that I said it.

14:29

It's an interesting thing to add. Maybe, I don't know, I was listening to it. Um So as I said, I stared market data. Um The database market is currently midstream a transformation from a license based model to a database as a service based model. In fact, in four years or so, like 5% of all databases worldwide will be license based and everything else will be

14:49

somewhat a service to enable that service piece. Guess what it's about data service provisioning and optimizing all your work flows coin does that? Um If we start to take a look at what the cloud providers have done, they did this first, which is they wrapped a commercial model around massive amounts of automation. There's just light years ahead of us and a

15:07

little more expensive depending on the scenario. What this means is as you're using the word data gravity as these landscapes are growing and your head counts are staying the same. You need to optimize how they're used, blah, blah, blah blah blah databases in kites or in a container environment is probably the future because it slip streams where people are now with their license based stuff of databases

15:30

deployed in V MS and get you closer to that cloud native experience. Maybe without the commercial model in between. Did you say Broadcom? I did. I like the fact it fundamentally changes how we work with databases as well. You've made a really good point in our session

15:44

yesterday and Anthony, sorry that um we don't really care about the computer. We don't really care about SQL server, do we, we just care about how we access that data. So as a DB A, I'm used to, you know, patching, stroking, whispering sweet nothings to my physical machines and my virtual machines containers change that there's something wrong with the container. I, I'm not going to fix it.

16:06

I'm going to Goodbye. It separates out the compute from the storage, which I think is a really good way of looking at and it's a really good way of doing things like patching testing. I can bring in a new version of sequel, access the data. If something's gone wrong, blow that away, roll back instantly.

16:21

Anyone here is it? Have we got any DB in ever tried to uninstall AC U on windows. Fun times. Yeah, exactly. So, no, I don't like, I don't like the fact that that whatever's happened, it happened in sequence over 2014 where see you broke the

16:36

agent, the schedule of job. So we have to roll all that back. It was a big hassle, the containers, blow that container away, roll it back to the previous version and you're good to go. Awesome. Thanks Andrew. Thanks Philly. Yeah, there was another question I forgot.

16:54

Must have been, we already answered it. Um So I work for an energy company and in my city it's a small city in Texas, but we're, we're barely rolling out smart meters right in 2024. And uh we're bringing in all that ami data for the smart meters and it's 15 minute interval reads. So we want our customers to show their usage

17:14

and we learned that it's filling it up quickly. Uh Do you guys have any experience with smart meter data and how you're handling it? Uh Right now we're getting about 300 million rows a day of SQL data. So it's growing but it's compressing it really well. So do you have any experience with that or just good luck for me?

17:35

I think um that's a lot of data. Like not even kidding. Like I I I'm, I'm packing this in my head right as we go. What's the business case to have that much data in a relational database all the time. See, I haven't gotten a straight answer from those people. So they said keep it all and it's more like that's the default answer usually.

17:57

Right. It's like forecasting. Let me get a hold on. So. Right. I think, um, what's the business cycle that you have to support? And I think you're gonna wind up having to make the business case that I have this data. It's on tier one storage and it costs this much money, right?

18:13

Um We have a, a similar pattern. Um The the software back end for most of the toll ways in America runs on pure and they uh they take eight images of a car that passes through a toll plaza, right? And they store that in a relational database, the world's most expensive card catalog. All right. And so we took that exact pattern and we took

18:42

the binary data and we put it on object, right? And so that's kind of where I like not knowing your architecture, but it's those kind of decisions that you're gonna have to make. Like how do I take the data still support the business use case but use the storage infrastructure optimally so that you're not lighting money on fire, literally. And that's gonna be the business case that

18:58

you're gonna have to go back to your org to quantify what that really looks like, right? What's the back end, the back end is sequel. So you're looking at the high partitioning strategies. So yeah, we're starting to do that with our, one of our data scientists, I guess you can call them. Uh It, it does get good efficiency though.

19:18

I think we wrote a 20 terabyte database and pure saw it as five or four. So it's not typically what we see is about a 4 to 1 reduction. I mean, I've worked with systems where for some reason, this was before my time when I joined, but they decided that the OTP was going to be as lean as possible. And we have this massive database and the method for getting the data from the OTP

19:41

database into a was service broker. They actually send off messages, pushing it that way, which did work until it broke and then trying to troubleshoot service broker and SQL server is something else. But it was trying to keep the actual O TB side as lean as possible and then push it over to this gigantic A for the reporting and all that

19:59

because we had the same thing like what's the day of attention on this forever? Yeah, I think one of the other things to think about though. Um if you think about the imaging use case, I also did a bunch of work in medical imaging over the years, your data is never gonna change. Like that sample data is never gonna change. So why put that in a relational database at all? Is the question,

20:18

right? Uh So, yeah, I think about that more architecturally from a software standpoint. So, or I get GPO or, or quote. So he'll get you a pal. No, Santino's here. All right, let's, let's keep going. Um One of the big aspects is around performance optimization,

20:42

right? It always comes up. Um So I'll start with you uh Andrew in terms of uh performance optimization. What, what advice do you have or what, what kind of challenges are you seeing in your customer environments? This is a really great one. The queries always need to be faster depending

21:03

on reporting TP. But sequel has some really great built in tools for monitoring. It's now switched on by fault 2022. But this query store query store basically will give you an analysis of every single thing that hits your database. And so it'll give you trends as well because there's always that tipping point of finding

21:22

we've piled a whole bunch of new data into our database and all of a sudden our queries are grind to a halt and you can see the trend going in there. So proper analysis of the situation before you actually go in and do anything. Um I had a DB A rule told me the first thing he does in a job for a month is nothing. He doesn't do anything and just sets up the proper monitoring,

21:42

alerting and all the reporting tools around it and just watches what is happening in his environment before he starts going in and just splashing indexes everywhere, potentially making the problem worse. Not a consultant, not a consultant, not a consultant. This is actually working as a DB A, I'll take this next, but these three are gonna give you really good answers.

22:03

Um But I'll tell you about an experience I had last year. So I was looking into a way to combine pure storage metrics with database metrics. Now, data and storage, they're very interlinked. But the metrics tell you similar trends but different people are gonna look at them. So it's a really difficult problem to say, how do you get the storage admin and the DB A to look at the same problem from the same lens?

22:28

Um And something interesting came up. So there's a lot of open source tooling and there's a great tool for the open source database community called Perona PM M or Pera management and monitoring. But the bit that was really good about this was how it did full stack monitoring in either containerized environments or virtualized or physical environments.

22:46

This blew me away because it was regardless of what you did, it told you the same thing. Um And if the best insights I can take away from that is persona has seen the collapse of the personas and for less people to monitor more but be more proactive and get better insight out of more consolidated dashboards.

23:04

Um spoiler, it's just very well wrapped Prometheus with agents, but it's, it's very, very good. Um And so the best thing I take away from that is look at the persona who's actually taking a look at the data and needs to understand what's happening so that they can appropriately action. You know, I I'll kind of go back to what I said earlier about the blending the lines between

23:23

the storage team and, and the DB A team, right? So in the past performance is always the hardest thing to troubleshoot, right? It's usually, it's usually you and it's usually you, it's usually networking, let's be honest. Um But, but at the end of the day, like working together and kind of what you were saying, right?

23:41

Being able to understand the storage aspect, the database aspect, the the the developer who's probably writing the bad query in the first place. Um And, and it's, it's really a a combination of those three people getting together and, and understanding how to make the most optimized query, how to make the most optimized report, how to optimize your performance um from a storage

24:03

from a code standpoint and from uh adding indexes everywhere standpoint, one thing I say about pure based flash doors in general is great because it's really, really fast and it's awful because it's really, really fast. You can write some absolutely nonsense SQL queries and it will pull back the data for you and you won't have a problem until say a month down the line and then you have a problem and

24:26

something that used to bite me quite a lot is it's been, oh, this has been in production for at least a month. But now the data has grown and all of a sudden performances nosedived off a cliff and you have to go in and dig in and find out exactly what's because you're looking at what's changed, like what's changed and the only that change is the data,

24:40

right? You lost all my thunder. Next question. So a Anthony, I'll stay with you and I'm, I'm gonna change gears a little bit uh to talk about cost savings, right? So my understanding is until recently cloud providers were charging for data egress.

25:01

Now they're doing away with it with some caveats, right? Um So in the context of that, what uh what are you seeing and what, what advice do you have, you know, in terms of planning for, for data egress, keep it all on prem. Yeah, there's that. So, yeah, that's a tough question.

25:22

Um Anyone familiar with cloud blocks store, right? It's our cloud offering uh in both Azure and Aws and one of the things that the pure platform does great and I don't want this is Anthony is a technologist, not as a person that sells flash rates for a living. The idea of data reduction snapshot, baselining and the ability to kind of break down

25:45

the size of data from the things that we need to do with the data as DB A S uh is, is really the power of the platform and it's gonna it attacks this data egress challenge because if I can in a storage efficient way, get data between two sites, whether it be between two data centers on prem or between on prem and the cloud and, and literally break the laws of physics and move a 10 terabyte database or 100 terabyte

26:08

database to and from the cloud with baselining data production. All the good is that we do, we're starting to solve some serious business problems now, right? So if we're injecting 300 million rows of data, and I need to have ad R solution to the cloud based off traditional technologies. That's a very hard problem to solve. But with the way that we reduce data and

26:26

baselining, which is the technique where we can um tell if a piece of data is already at a target site and not have to move that data again is is absolutely invaluable because any application level metric or application level application is gonna move that by the data again which impacts our ability to recover a system. So uh if you think about fail over and fail back, like nearly a solved problem on our

26:49

platform, and I love being able to talk to customers about that all day because we can break that that space and time right. 10 terabytes, 100 terabytes, we're gonna help solve those challenges. I had nothing to add. That was great. Alright, I'll take the next bit. Um So Anthony took an interesting technical

27:12

approach to the answer. Um As I keep saying, I stare at market data, my brain is going a little, you stare at market data all day, every day. He used to be a nerd. I wear a suit. It's great. Um So I I was looking at market data and a few months ago,

27:27

I wanted to justify a program around pure to do stuff with private and sovereign clouds. And one of the first questions everybody asked me is, what's the market revenue opportunity? I'm like, I don't know. So you have to do a whole bunch of research and in that research, it was, you have these cloud services where the world has been built for the

27:44

past 10 years. Um The nice thing about it being 10 years is we have trends we can look at within that. And mckinsey, I believe it was said 75% of the world's data is gonna live in a public cloud provider. Um whether they're small or big, it's gonna live there. Um 25% will be on prem that was very misleading.

28:01

Instead, it was 75% of workloads will live in the cloud and 25% will be on prem the important bit of the analysis that it didn't really bring out was the repatriation trend. So let's think about it like this, you've got five years, you're a customer, you're like, I wanna go cloud only. How many times has everybody heard that?

28:16

Um And three years down the line and it's like crap. This is costing too much money, but we wanna be cloud only. And so what, what actually emerged was repatriation and repatriation and FOPS has emerged as how do you optimize these landscapes? And so Niel asked about cost management. Um The interesting thing is that at that point

28:35

when um people with sea level titles or VP title say we wanna go cloud only as no one asks, what's the strategy over a number of years when cost gets out of control? And so the question that needs to be asked going forward is when we put a workload in the cloud, what is its life cycle? Um If there's 75 terabytes of data in there, are we just going to put it off to Aws glacier shocker that's actually more expensive in some

28:57

scenarios? Um because people still like using data, guess what? This two digit trend that's coming out and saying to everyone means people want to be able to train data from the beginning of time. So you kinda gotta move that out. Um So what emerged was that 25% of workloads that are on prem is a FOPS conversation around

29:15

what workloads make sense at scale to put on prem and what workloads make sense to keep in the cloud. But you, what you're not going to do is you're not gonna tear the workload, one workload between the two and said you gonna put it in the best place for it to live. Um That was, I think I'm done with my answer. I, I think that cloud only trend is, is starting to disappear,

29:33

right? And it was, I always shook my head where you go into a customer. The cio is like, oh, I wanna go Cloud only and then you talk to DB A and he's like, yeah, he wants to go Cloud only. I don't know how we're gonna do this, right? Um The pendulum has kind of swung back, right? And it has kind of landed in the middle where

29:48

yes, you now have consulting services that literally come in and look at your application, look at your data and and tell you help you tell you where it should land, whether it's Cloud, whether it's on prem, whether it's hybrid. Um But yeah, I think the trend is starting to the pendulum. Stopped, stopped swinging a little bit.

30:05

I always like these questions because I don't think it's a new thing. I remember in 26 sequel server 2016, they introduced a feature called stretch tables. Remember those where you could have on premise would be a live data and in the background, it would copy your data up into the cloud. So if the least access data would only live up in the cloud and then be pulled back when you needed it.

30:24

The problem with stretch tables was, it was hideously expensive and everyone was just like, absolutely not. Are we doing that? So Microsoft have tried these hybrid solutions before and we've seen the next iteration of it now, I think coming through awesome guys. Um Let me see if there are any questions here.

30:44

Well, I do have one question for you. Silly. Um Just switching gears. Uh We were talking about A I a little not supposed to use that. He doesn't want to use the word. No, no, can't drink. Um So anyways on, on A I um uh I'm trying to formulate my

31:06

question now, did I derail you all good, all good. So in terms of uh the uh preparing the infrastructure for A I and focusing on model training and inference, what are the, what are the strategies that companies are adopting or what are, what are the things that you are seeing in your discussions with the customers? I had this conversation with an executive two

31:32

weeks ago. Um It was fun. Um I still have a job. It's great. Um So the conversation went like this. Um You've got pre-existing ecosystems and you got Oracle databases and SQL server databases and we now got this new A I trend that's probably gonna solve all of these problems. Um where are we gonna put investment?

31:51

I was kinda like that's a very flawed question. Um Because instead what's happened is a I is a feature, it's an improvement on these things, but it needs to live within data ecosystems. Um I'm sure someone's done an analysis somewhere that says the majority of the world's data lives in SQL or oracle or something akin to that.

32:09

Um My SQL and open source database is also another very important ecosystem. What is not going to happen is new data, ecosystems are going to emerge like that. We're not going to create synthetic data to train synthetic machines. You're going to use real world data. Um For example, there was a good question around you got transaction since 1997.

32:25

That's real data, it actually tells what humans are doing. Um And so from this A I trend thing, first thing that's happened, um You've got this emergence of lots of vector databases, new open source start ups or just start ups. Let's not use the word open source. Um They are very quick to the market. That's great.

32:42

I think some of them will succeed. But what's really going to happen is you're going to see those data ecosystems are going to thrive. So let's look at oracle 23 A I SU is gonna be happy to hear this. Um We're talking about Oracle now. Um guess what they did. Um One of the key innovations within that

32:59

engine is vector modeling and vector capabilities within the engine. But the way oracle has done this, I love this, by the way, I tell everybody this, um Oracle's not like, oh go and buy a whole bunch of NVIDIA hardware. And instead, they've said we've got a cloud, we've got a database, we've got an existing data ecosystem. What we really want you doing is staying in our

33:15

data ecosystem and improving it, convert your data to vectors or create vectors that point to your data, whatever and then tie it up to OC I to do all the LLM stuff that makes a lot of sense. Um From the perspective of you don't need to go and buy GP U to sit on your on premises for a lot of people that's very expensive. Um The second thing is it is beneficial for oracle cause it drives you towards the cloud,

33:37

but it feels like a very efficient way to solve. The first question Neal asked, which is what are the cost optimization stories? And that's an interesting story. The second aspect of it is you have the emergence of a lot of vector capabilities. So in post, you got like PG vector, um the open source community has always been absolutely fantastic with just being ahead of the ball and getting those innovations rolling.

34:00

And my theorem is that established data ecosystems that include that are going to significantly outlast everything else or we are going to see a very interesting integration pipeline between transactional stores that don't currently store support vector database capabilities. SQL server does not currently support it. Um It might maybe one day Hannah does Hanna does.

34:24

I didn't say Hanna. Um and that that's in the cloud. It does. OK? In the cloud it does so OK. Let's let's take it out of the on prem business for pure so SQL server in the cloud. It does. And what you will potentially see is like tighter integration between non vector stores

34:38

and vector stores very similar to the way we did etl transformation between transactional engines and OAP engines. Um What does that mean for pure et cetera? Um Pure is really good regardless of what data you store. And the biggest problem with vector capabilities is it balloons up the storage

34:55

requirements. So you got two things. We, we released 100 and 50 terabyte DFM, like just store your stuff on us. It's great. The second bit is D dupe like we can theoretically de dupe a lot of that data down. We've done an investigation on Mils and kinda like over time, we do see that capacity footprint come down.

35:11

Um But more importantly, you are just going to need a lot of storage to handle these capabilities to take the best advantage of generators of A I training. Awesome. That was good. I thought Oracle just slapped an A I sticker on 23 C and kicked it out the door. Oh no, this is funny. Oracle is always about the trend.

35:28

So I was for internet GG was for grid computing and now it's a I Oh yeah, sorry. I just think that's funny. Anything any anyone else wants to add? Alright. So um the other question for you Andrews was, can we just leave? I have a seat.

35:48

This never happens. Oh man, it's always fun with you guys. You guys are gonna have to ask questions otherwise they're just gonna keep picking landry. So governance, that's, that's another big topic, right? And uh especially in regulated industries,

36:10

what are you seeing? Uh you know, what's the, what are the trends? What are you hearing from, from customers around governance? What's the significance in terms of database environments? Um Specifically database environment governance has always really been handled in the software layout.

36:27

Um But from pure perspective, there is something interesting. So safe mode snapshots um actually have a governance component to them. Um The first is permission to delete and then the second one is guarding against eradication of data. Like it's, it's a very small subset of governance, but that's one way to do it.

36:46

Um That's an interesting angle to take on the direct database piece. But let's take a look at your data landscape because that's what's really happening. We're talking about databases here. But what everybody is really worried about is I got my boss telling me he wants to implement elastic or something. Um And so let, let's take a look at your data landscape because that's where governance

37:03

really comes in. Um We had a customer last year. It was a government who wants to buy stuff. We were competing against competitors obviously, but we won for the weirdest reason. No one ever picked, picked, picked for, we have something called object lock. We were collecting loads of object data for um It was like auditing and and what not to make

37:23

sure people aren't doing things. The problem is we didn't want, the customer, didn't want anybody who could get into the data and alter it. So they had policies for each layer, especially the storage and we won because we stopped people changing the objects and we won it by it. It wasn't by fully accident, but we just didn't see that one coming. And so your governance policies are effectively

37:42

from the storage perspective are lots of little pieces that build up into a better cohesive whole. Awesome. Uh Thanks. Thanks there. So we have, I guess about five minutes uh wanted to see if there are any questions in the room, data analytics, right? Uh Oh

38:04

Hello. Uh best practices for years, right? Is uh uh run your analytics, your queries uh on your non live production database, get a copy, right? We all know a bad query can destroy uh performance. But how do you deal with the spread of those replicas?

38:22

I think what's happened with our team is that every team who wants to run a query just gets their own copy. And now I have more copies than I know what to do with. How do you guys deal with managing that spread or how do you deal with a, a new kind of model of best practices for data analytics against the production database?

38:40

So the grumpy DB A and when they ask for a new copy, you say new, unless you've got some sort of automation tooling in there where they can click a button and just get a copy of this. Drives me nuts. I've worked with a company that when the developers wanted to test against like live data, they had access to grab a backup of the production database,

38:59

copy it down to their laptop and restore it on a local instance, a sequel server. And I remember walking around going what and the way we got rid of that was telling them, by the way, you've installed enterprise edition of SQL server and all those developers like laptops. Have you paid for those licenses? How much is it?

39:16

Oh, right. But it's basically working in controlled environments, the production environment, even a DVA. I don't want access to production. I don't want to touch it unless I really have to. I've taken, I've worked in companies where I've taken away my right access via Windows accounts

39:33

mainly because I kept writing update statements that broke stuff but for other reasons as well, we'd have to go through SQL authentication to get into production, just put extra step in front. But it's all about control, having control of that environment and making sure that yes, when someone requests a new replica covering, going back to the businesses of why they need

39:50

it. And is there existing infrastructure out there already that can service that need instead of just going? Yes, of course. boom. Because then you end up in your situation where you have all these different replicas out there that you have to manage. Look after patch maintain and things can possibly go wrong with them H A for them as

40:06

well, maybe things like that. So yes, does Cuban solve these problems? No, I will say I will say 99 times out of 100. The answer to the question do we need Cubs is no, does databases a service solve that problem? Depends on the depends because I mean because it is the operational burden is the challenge. Then we can solve that with deep Baz but then you still have with a load of replicas.

40:35

I mean flash ray D dupes have a nice day but but it brings up a good point, right? Sometimes our data services are so cool that so easy to use that things like that happen, right? I always talk to application teams and say, hey, talk to your storage guys, get access to create snapshots, right? Create as many as you want because it is that

40:55

easy and it does do do, but I can see how that does lead to operational burden. A lot of people are now performing teams around the in the cloud, right, where the cloud gets expensive because it's so easy to spin up an environment, right? And now you have to start, you're not worried about do you can you can you access that environment?

41:14

It's how who can create that environment? Because now all of a sudden you got 10 people creating environments and you get the bill at the end of the month, right? That, that gets pretty crazy. So um I love snapshots. We talk about it every day, every day, every day over Bourbon, we'll talk about snapshots.

41:29

Um But you're right, you now have to start putting those policies in place to be able to control who can create a snapshot. And how often my particular favorite of that is um who here has ever installed an application that needs an instance of sequel express running underneath it. Oh Yeah. And then all of a sudden you have all these little things that pop up that are completely

41:47

unmanageable, they're not backed up because there's no agent by default. Um So I actually did a project where we migrated all of our sequel express instances, drink into a cer cluster where we had, we gave them, it gave us H A with uh um desired state versus running state concept within Kernes and we could monitor them, manage them, back them up all in one central location instead of just having these random

42:13

little databases popping up everywhere because we the developers could go out, they can install their software. Oh, yes, I need a sequel express instance. Thank you very much. And boom, you've got another copy of a database somewhere that you need to manage and maintain. That sounds like database is a service. We should stop saying that.

42:32

Awesome guys. Um Any, any other questions? Right? Well, we hope to continue the conversation with you guys. Um We have more sessions coming up, Anthony and I have a session on a PS at 315, same room. So look forward to welcoming you back. Now. We have one at one.

42:53

So 1130 115 on uh on the power, it's, it's a snapshot based session, but it's on the power of Vmware, V Vols. 11 o'clock. It's a flash talk. Highly recommend two smart cookies. Oh and three o'clock in the two in the Expo hall.

43:12

I'm talking about SQL server in port data services and I think a bunch of us are going to be a beer o'clock at 515 as well. I want to know if there's actually beer. Yes, I would be if there is not, you can't call it beer o'clock and then not have beer there. Although we're in Vegas, it's always beer o'clock. Awesome guys. Uh Well, thank thanks for the insights.

43:33

Uh Thank you everyone for coming. Let's keep the conversation going. And if you need more information, check out pure storage.com/applications. We put a lot of great uh stuff uh there. So let's keep the conversation going. Thank you so much. Thank you. Bye now.

Video

Continue Watching

We hope you found this preview valuable. To continue watching this video please provide your information below.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.