00:00
Well, thank you guys for coming. Oh, I see some people are coming in. Um So my name's Jason Sarrige. I'm a technical product manager in our hyperscale line of business, and I'm here to introduce you guys to the newest members, uh, newest member of our Flash ray family, the Flash Ray ST um,
00:20
before I get into, you know, what it is, uh, I wanted to tell a little story in history about how it came about and to do that, I'm gonna take you back to 2017 around the time when Intel had announced, uh, Intel Octane using 3D cross point technology. Uh, with the promise of having this new storage that could bridge the gap between the speeds of DRAM and Nant all with, uh, the other additional promise of having a far,
00:47
far, uh, more resilient storage medium that with respect to the right cycles it's supported, um, and at that time causes was thinking how could we take advantage of, of this technology and by simply replacing the SAA and SASS-based Nan that we were using at the time. Uh, with, uh, with Octane, he realised that we wouldn't have the transformational performance, uh, uh,
01:13
improvements that you would expect because mainly at the time and today even still, um, the, the time, the, the cycles that we spend on ensuring that we have incredibly uh uh efficient storage, um, would overshadow and outweigh the benefits of the underlying storage medium. And so, uh, this is kind of a crazy story, but over the course of a week, uh, he was on holiday watching his son's baseball,
01:41
uh, tournament, and in his free time he decided to be a good thing as a thought experiment to rewrite the software IO stack for, for, uh, for purity. And his goal was to eliminate as much latency from the uh from the stack as possible so as to flip the um the latency impact where the latency coming from the drives would far outweigh the latency from our software so we could really have the underlying storage shine.
02:10
Uh, when he came back, he handed it off to this as he handed it off as a, a skunkworks project to just a small team of a tiger team of engineers to run with and build out this idea on a flash array to see if it really held well, uh, held water, um, and. They did that and they proved it out and what they've realised is OK we have we've proved it
02:37
out we can actually get some significant performance out of a flash array using uh really fast storage media but what we really needed was the use case how are we gonna make this work. Um, and so it just coincidentally we had a highly collaborative customer approach us with a high performance challenge, and that high performance challenge could not be, uh,
03:00
addressed by the highest performing flash rays that we had at the time, uh, and the engineers who were building this in development race car of storage, uh, thought, you know, this would actually be a really good, uh, platform to to solve that need, and that customer happens to be with me today. Um, so I'd like to introduce, uh, Dan Gentry from ServiceNow,
03:22
uh, to tell us a little bit about ServiceNow, the scale of the challenges that he deals with on a daily basis, uh, and the challenge that he presented to us that we thought the Flash AST would be the perfect fit for. Thanks. Hello, um, as you said, uh, I work for ServiceNow. I literally have the best job in the world, and I'm gonna tell you why that is,
03:43
um, and go forward a little bit. We're gonna talk a little bit about, uh, database storage at ServiceNow. I'm a principal hardware architect in the cloud capacity engineering group, so, uh, people talk about, um, uh, the cloud. Well, the cloud is the computers that, uh, my team builds and runs and organises, uh, standard safe harbour,
04:02
you've seen a million now you've seen a million and 1. So, uh, at least that's some context for what we're gonna talk about here. Um, uh, you probably know that AI is a workflow company. Um, we provide products that our customers use a lot of. They use the same basic tool sets in different ways and they all have a different use case for
04:21
them, so they have different challenges. Some need more storage, some need more compute, that sort of thing. But, uh, like everyone these days, we have, uh, uh, AI offerings. They're really cool. If you haven't seen them, you should go check out the.
04:31
Uh, recent Knowledge 25 replays on YouTube. They're really excellent good demos. Uh, the AI, uh, agents take advantage of the workflows that everybody knows, and those take advantage of the rest of the base platform features that that each one of our products have. But where I live, where I hang out and what I'm responsible for is the, the data layer where the databases live where the app servers live,
04:52
um, where all of these things that that are the engines that make the rest of this, uh, wedding cake possible, um. And you know, we have a, we have a pretty good scale, um, sort of, you know, uh, tooting our own horn a little bit, but, you know, 68,000 servers, 30 data centres around the world,
05:10
uh, we tend to deploy in pairs. So like we've got, uh, a data centre in Austin and a data centre in Chicago, and we deploy resources to both of those at the same time for, for fail over and safety and that sort of thing, but. The sun doesn't set on our empire.
05:25
Uh, we like, uh, we, we have to deal with, you know, data sovereignty issues as you all do. So we have, we have a fairly, uh, wide reach, um, and a lot of stuff happens in our cloud in an hour. Uh, that one of the interesting things for this one is 560 terabytes of backups happen in an hour, and, uh, those backups happen to flash blades.
05:47
Um, they have a very specific use case. They have a very specific value proposition for us, and they work really well, and we like them a lot, but other things like, uh, 7.2 million workflow activities, so all of our customers doing all of their wonderful work, uh, running on all of the machines that we get to build, they're doing a lot of stuff, um, 500 alerts generated, um.
06:08
Uh, a lot of stuff happens in our cloud, and all of our customers use our cloud in different ways. So we have to have a, a fairly general purpose platform that can serve their needs, um. Normally how we deploy database servers is much like you're used to it's, you know, standard bare metal stuff. Yeah, you've got a server,
06:26
you've got some envy and me storage sticking in that box. You've got CPUs and RAM, and we have of course our ways of provisioning different sizes of instances and whatnot, right? We've been, uh, all flash internally since I think 2012 so we're kind of on that, on that, uh, train early and it's, it's paid out well for us.
06:45
We know how to do these. We know how to deploy these, uh, we're good at it. We do it regularly, we update the tech, everything that you, you, uh, be sure of. Uh, but some of our customers, uh, they run into, uh, um, sizes of scale that like I said, they get large enough that they don't maybe want to fit
07:02
on a certain type of server anymore because they need more storage or. Uh, they need more memory than a particular generation of storage has. So we have to figure out how to support our customers such that they don't even know that we're gonna solve the problem that they're having for them, right? So we worked with Pure to develop a platform
07:20
that allowed us to, uh, specifically take advantage of a customer that happens in long running backups getting into the double digit hours and take move them so they could deal with with, um, snapshot backups and restores or if they needed to clone their database for, um. For you know development purposes or they needed a read replica or you know something like that so we we developed this uh this
07:43
platform uh the original iteration of it was, uh, Galaxy was its name, um, it allowed us to deal with dedicated uh customers it allowed to solve particular problems it allowed us to serve our customers better. um, there was, there were some um performance implications and. In order to fit the customer on, they had to be a certain size,
08:03
they had to be able to handle a certain amount of latency in their application because, you know, When you, when you deal with RDMA or network based storage, every feature that you add between the database and the storage adds a little bit of latency and time, and some, some of the customers couldn't handle that. So we went back to our friends at Pure and we gave them a challenge.
08:23
We need, we needed bigger, better, faster, more, and we needed it. Really soon because we wanted to be able to put more customers on this, this platform that allowed us to do snapshot backups and restores, um, we needed to be able to make more advantage and, and, and, uh, help more customers as they get into this problem.
08:40
So, um, that's where I'm gonna hand it back to Jason. To tell you about how they fix it. Thanks, Dan. So now we all know what happened to Octane technology. Intel killed it. It was fast but it was really expensive, low capacity, and ultimately they couldn't find a
08:59
market to, to, uh, to sell it in, but that didn't mean that storage innovation from a performance standpoint ended there. Um, you know, we came from a time when we had SAA and SAS based SSDs and now we have MBME and caching capabilities at really lower latency, and that made our, our product or our idea for this race car, uh, still hold weight because what we did with
09:24
software, uh, still made whatever the underlying storage, uh, shine. So Looking at the enterprise application landscape, you've probably seen this slide before, um, Pure has built amazing products that have just the perfect amount of balance to to satisfy uh high performance and high storage efficiency needs, and it covers up until Flasher
09:47
AST about 95% of the market, but it still left a 5% market uh performance that we weren't really addressing with the flash array. And that was the one that's for ultra high performance, uh, workloads. Uh, there were, um, and I say were because it's kind of speaking ill of the dead, but there were customer uh companies out there who built products for the extreme performance.
10:11
You can think of companies like Violin Memory, Ape Iron, and most recently DSSD, which was a Dell EMC company. Uh, but where they kind of faltered is that they were bespoke solutions. Uh, it created silos of storage, um, stores that didn't play well with, uh, their greater portfolio, and it typically had a completely different management,
10:32
uh, requirement in order to manage the the storage and. So we believe that we could improve upon that by providing a solution that actually provides the ultra high performance that you can that you want you need for these ultra high performance workloads while also um working and having a compatibility with the flash array family so that you can ensure that your data can be you get data mobility,
10:59
uh, data protection, uh, and data management. Uh, with a very flat learning curve to be able to adopt it into your environment. So the first ingredient in there is performance, and you know the flasher AST is the system you would turn to the high performance enterprise class storage system you'd turn to for when performance is paramount, uh, with more than 10 million IOPs up to 88 gigabytes a second of
11:27
sustained, uh, throughput and latencies that start in, you know, mid 2 digit microsecond latencies, um. You're talking about a system that offers an incredible amount of flexibility to consolidate extremely high performance workloads without what would traditionally require uh uh a race sprawl. Um, transitioning into kind of how we did
11:57
it, um, the first thing that makes the solution palatable and accessible to a lot of our customers is that we're leveraging flash array proven, um, hardware and software. We start with Flash array, Excel chassis and controller Foundation, uh, which is a hardened platform, and Purity software, tried and true,
12:20
this is the same software you guys have been running for years and years. Uh, and then we, we, uh, with that singular focus on performance, we layer in what started as causes, uh, you know, thought experiment turned into, uh, a really great high performance software IO stack where we've reduced latencies down to. Just 5 microseconds.
12:43
So when you ask for an IO 5 microseconds is taken up by our software and the rest is essentially coming from the underlying storage that that's serving that data and then we've we've eliminated latency inducing, uh, features like duplication, compression and synchronous replication, uh, but we know that any kind of enterprise uh class solution we couldn't compromise on things like.
13:11
Uh, data protection, data mobility, and data agility, so we still have incredibly space efficient snapshots. We still have the capability of asynchronously and bidirectionally replicating between any of our arrays, and we still provide um instantaneous rideable no copy clones capabilities. Uh, continuing the, the, the stream of the, the streamlining, uh,
13:38
idea when we think about what's the fastest protocol, what's the fastest, um, uh, um, protocol we could support, we chose to leverage the fastest one you can, you can find today, and that's NVME over fabric, specifically RDMA over converged Ethernet or Rocky V2. Um, to be our, our protocol of choice, uh, and that allows customers to leverage industry standard 100 gig Ethernet nicks and
14:03
switches to provide, uh, the high performance and low latency access to storage, um, that we would require in a in a solution like this. Uh, looking at the hardware, I mentioned that we're starting with the foundation of a flash array, uh, XL, so it's a 5U system. So what you're seeing here is both front and back of it. It's not two systems.
14:27
So just one of those systems is 5U and what it brings is the familiar familiarity of having uh stateless controllers. So you have non-desruptive operations. You, it brings the no single points of failure from a hardened hardware platform. Uh, all data is encrypted at rest all the time, and the only real difference is that we are, we aren't leveraging, uh, the DFMs, uh, that you find in the rest of the flash ray family.
14:55
We're leveraging, uh, the fastest enterprise class, uh, MVME SSDs that we could find that offer the lowest latency because again performance was paramount in this design. Now if you, if you think about this and you think about um the competitive landscape like who are the competitors, uh, for any of our competitors to provide,
15:15
um, the, you know, sub 100 microsecond latency and 10 million IOPs, uh, of performance, you would require a competitor to have 6 to 8 engines. So you're talking rack scale, uh, infrastructure for storage. And those engines would have to be loaded to the gills with DRAM to have as much data come out of cash as possible to reach down to the latency levels that we're able to deliver even
15:39
directly out of the SSD. And when you compare that to uh Flash AST serving you up in just uh consolidated 5U it's kind of in a class all its own with respect to performance um um over versus space. Um Data mobility, um, again, one of the places where. The competitors of old who aren't aren't with
16:08
us anymore, um, failed is they required you to buy two of their systems if they had replication capabilities of the exact same types so you have two high performance systems to ensure that you had a second copy, um, and that's not really financially palatable to most people, right? Um, and so we offer bidirectional, um, asynchronous replication capabilities to any one of the flash arrays in our,
16:34
in our fleet, um. And what that gives customers is the flexibility to be able to place data. Uh, data copies on the, the storage media that provides the performance and the cost profile that best suits that data copy. Um, and then one of the bonuses is because we're not doing data reduction on the source.
16:56
Data reduction is still running on the destination, so every copy you have will be efficient. It's not a 1 to 1 copy it's a one to less than one copy because the arrays on the other side are going to duplicate and compress that data down, making it, um, a financially palatable way to protect your data. From a manageability standpoint.
17:18
One of the things we wanted to do is make sure that the learning curve was flat. We don't want to have any kind of increase in learning curve, and we've achieved that by providing you the ability to manage the system in the exact same way you're used to managing any flash array. We have the same CLI, same rest APIs, same GUI, everything feels the same.
17:37
There's nothing new to learn. Everything you know about creating volumes, creating snapshots, setting up replication, it works the exact same way. And Pier one, it's uh we integrate seamlessly with Pier one so that you can um have flash array ST serve as just another flash array in your fleet and still get the predictive analytics that Pier One is able to provide.
18:02
As we transition to use cases, the first use case that comes to mind is high performance databases with 10 million IOPs, online transaction processing databases is the first thing that comes to mind. Um, and we're this solution is ideally suited for that, but we also provide incredibly high sustained throughput and so when you think about data warehouses and OLAP workloads, we can we can handle those with with ease and then
18:31
a combination of both those, uh, the performance, um, uh, metrics really um really benefit in memory databases and you might think in memory databases that's all running in DRAM, but. Using SAP HANA as an example, all in-memory databases require persistent storage to maintain the door the data because the moment you turn off the power or shut down the
18:55
database, everything gets wiped out of DRAM and so you have your persistent storage to make sure that you can get that data back online, but while it's running, um. A little known fact is that every few minutes these databases have to persist save points down to the persistent storage and during incredibly high, uh, times of activity think quarter end,
19:19
month end quarter end activities for large enterprises or for retail, the easiest one to think about is like Black Friday where they're where they're processing millions of transactions a minute, right, even probably per per second, um, those save points. Going down to a system that's already overloaded or you know on the verge of being
19:40
overloaded can end up taking down websites and forcing um uh the the processing of those payments to stop and therefore you're losing revenue every time there's a a downtime and so uh the flasher AST's consistent low latency latency, a consistent ultra low latency. Uh, and performance gives you the headroom to be able to support those workloads. The second part of that is every time you have a planned or unplanned,
20:05
unplanned downtime. All the data needs to be fed back into DRAM and so to keep your SLAs as low as possible, you need to have a solution that can give you high sustained throughput of reads to read the data from the persistent storage and loaded into memory. And we're talking to a customer right now who has who have databases upwards of 48 terabytes
20:25
in size, and they're telling us that they want to load that data much faster than it would take them to fail over to another system and this um the sustained throughput that we can provide allows them to load terabytes and terabytes of data in just a matter of minutes. Um, and then SAP HANA offers a a special, um, capability for customers with large databases that may be getting close to outrunning,
20:55
um, the capacity footprint of DRAM. Um, and when you start reaching that limit, you have a couple options. Your option is split the database in in two or three, chart it out to different, different, uh, systems and break it up. Or you can leverage what they call, uh, is native storage extension where you can carve
21:14
out a portion of the data that's in the in-memory database that you would be considered warm or infrequently accessed, and you can persist that down to your persistent storage and then keep your hot data running in a much smaller DM DRAM footprint. And because of our ultra low latency we kind of bridge the gap between the DRAM and your traditional Nan storage to make it so the impact of paging out to persistent disc isn't
21:40
as painful. Next up in the healthcare environment, um, we're finding that AI is being leveraged more and more in the electronic health uh medical health records, uh, management, um, systems provided by companies like Epic. And with the AI uh capabilities it's putting additional strain on the underlying
22:07
infrastructure specifically on the storage and so having a system with a performance headroom that we have in a flasher AST allows hospitals and healthcare providers to adopt and take advantage of generative AI um capabilities to help improve health um patient health uh health care, um. Without impacting what they're used to, um, what what how their system runs without AI.
22:42
Um, now, virtual desktops is not something that, because I told you that we don't do any data reduction, virtual desktops is not typically something you'd think of first as, uh, uh, as a use case, but we were approached by a very large gaming company in Japan. You can guess who who have um high performance developers and each of those developers actually have their own dedicated hardware dedicated workstations,
23:06
and they're running local MBME and we're we're being told that their hyper their, their busiest, uh, developers, uh, push upwards of 100,000 IOPs per developer, right? But one of the challenges they have is those local desktops are not protected. There's no raid. There's no snapshots, there's no replication.
23:27
And that's their IP gaming is their IP, uh, and so they've asked, uh, if there's a way for us to help them, uh, by moving their developers' data into a shared storage platform that has the performance to be able to consolidate a large sea of these developers onto a single platform that provides them the capability to do raid snapshots, replication to protect their IP.
23:52
And flasher AST's unbelievable performance allows them to do a great amount of consolidation while avoiding a race brawl. And for my last use case I think it's best to hear directly from the customer who came up with it and so I'm gonna hand it back over to Dan to tell you more. Thank you, sir. So, um, as I hinted at earlier,
24:18
uh, a majority of our work is done on, uh, bare metal servers with, uh, NBME storage. Um, the customer comes in, we need, uh, let's say it's a telco company. We want to load all of our network inventory into into the into the database, and we want to be able to run reports against that.
24:34
So they have a fairly large, let's say 30 or 40 terabyte instance, and they want to do backups. So they, we do a backup a day, um, and we'll go through and we'll we'll pick up the data, we'll stream it, we'll encrypt it, we'll send it out. And that backup time can get long, like really long, especially if they want to make a copy
24:51
for running development code. So now we have to spend the time to freight those bits out and freight them back in and, um, that takes time, it makes them grumpy, it makes us grumpy because every, every large instance needs another server with another instance of uh directly attached storage and VM devices, etc. so you know, you have wasted space,
25:12
you have wasted resources, you have grumpy customers so. Um, like I said earlier, with, with, uh, the original Galaxy, uh, approach, we had some higher lat latencies that meant that some customers couldn't stand that. Now, with, um, with the ST we can, we can put the primary. Gotta have best performance databases on the ST.
25:35
We can put development and test on, say, an Excel or a flash arrays. Uh, we can, uh, fit the database and the need of the customer to the storage and, um, as Jason said, you already know how this stuff works. If you have a flash array, you're running it now, you're supporting it now. You know how to do snapshots, you know how to do restores,
25:54
you know what the performance profile is gonna be, um. They, uh, pure storage, uh, made me eat my words. Uh, back in 2019 when we came up with the first, uh, iteration of, uh, of Galaxy and talking about how it works and, well, you know, network attached storage is just never going to be as performance as inbox
26:15
storage. It just can't happen. If it's on the PCIE bus, it's just gonna work better. I'm wrong. With uh the ST network devices uh over Rock EV2, the latency is essentially equal for your tests, um, and I, I hate eating my words, but I have to admit I was wrong.
26:34
They, they, they did a wonderful job, um, and then, you know, uh, we're gonna take snapshots of this. It's gonna be, uh, we, we'll probably do it, uh, on a more frequent basis, uh, but it's got fault tolerance and rate and that's good. So when we send a snapshot off to say, uh, a local replication target like a C.
26:51
Um, we get our data reduction there, so we, that we can, we're still gonna see savings and we're still gonna put a customer workload, um, where it needs to be to get the best performance and the customer will just work better and be happier and who doesn't want that for their customers. So, um, yeah, um, thanks, um, uh, thank you, pure gentleman,
27:11
yeah, you did an awesome job and I look forward to doing more. We look, we look forward to helping you with more. Thank you. Uh, so the, the last thing I want to leave you with is I get to do my, uh, Steve Jobs impression and leave you with just one more thing, um, you know, we've been talking to, to customers like ServiceNow.
27:36
Um, about this product, but it's been, you know, very much directed availability, right, directed, directed conversations, and some of the feedback that we've received is that there was concerns about capacity. We've, I mentioned before, this is 100 terabytes you, uh, of raw capacity in the system in the first iteration, um, and they, they asked for more capacity.
27:58
Uh, and, uh, I'm happy to announce that in Q4 of this year we're going to be releasing an R5 variant of the Flasher AST and with it we're looking to quadruple the available capacity to 400 terabytes, but you see the R5 also means that we're gonna have, we're gonna take advantage of R5 controllers and with those controllers comes even more performance than what I shared.
28:24
Uh, and, uh, a transition from 100 gig Ethernet to 200 gig Ethernet, uh, and now while it's a bit, uh, premature to share, uh, specific, uh, performance that we're seeing on the R5, I can tell you that we are targeting, uh, at least 50% improvement in IOPs and throughput, uh, and some early testing, uh, is resulting in significantly higher than that,
28:48
um, so, um, thank you guys for coming.