25:49 Video

Optimize Ingest and Search for Splunk at High Volumes with S3

Learn how Pure FlashBlade brings fast object storage to SmartStore for an architecture that optimizes Splunk for search and ingest.
Click to View Transcript
00:08
Vaughn Stewart: Welcome to Pure Storage accelerate. I'm your host for this breakout session, Vaughn Stewart, and today we're going to talk about optimizing Splunk. Particularly the ingest in search, this is going to be a quasi-deep dive because of our time to help you understand the key attributes around the
00:27
infrastructure supporting the next generation Splunk architecture called SmartStore. You can see our agenda on the board, I'm not going to waste it in this precious time. Instead, we're gonna jump right into it. So when we talk to customers, we hear consistently the same four or five challenges within their
00:48
classic Splunk architecture. Now, I want to pause what I mean by Splunk architecture from this point through the rest of the presentation, is I'm talking about the indexer cluster. The indexers are where data is received, or processed, or indexed. Right, it's stored in what's called buckets. The
01:08
storage index, the Splunk indexers are the storage tier within your Splunk environment, we won't be talking about report servers, we won't be talking about single or multiple site configurations. We're not talking about forwarders, or any of that other elements. They're all very valid data points. But
01:28
we're now we want to focus on your data infrastructure. So what we hear we hear customers talk about indexer management challenges at scale. These are really the byproduct of rebalancing the data, because classic Splunk, you know, is roughly 15 years old, and is built off of the direct attached
01:47
storage model, like what you see with HCI today. And so analytics platforms like Splunk and others that use like HDFS or replicas, you know, suffer from needing to address data every time you're managing compute, that takes time and resources and can negatively impact your Splunk searches. No one likes that. We
02:09
also hear about lengthy ha recovery times, right, you have an indexer fail, and it takes a long time to recover. The reason for that is because the data on that server needs to be validated, anything that's inconsistent, evicted. And whatever is missing, you know, copied back over from from
02:29
another node. We hear a lot about inconsistent search performance. This because classic Splunk tends to have high performance tier and a low performance, low cost here for older data. That low cost here can also be negatively impacted by storage savings technologies, like TSI dx reduction. And so
02:49
customers are running report, even if 1% of that report has to go to their cold tear, the whole report has to wait for the return of that data. And so performance suffers within the end of the infrastructure. And it's really kind of hard to predict, right? If you're an end user, we hear that it's an
03:06
expensive infrastructure. And look, you know, everybody wants to drive cost of the infrastructure. So we'll touch base on that here in a moment. And what we've been hearing lately is customers who've gone to the cloud. And for those who need to search data beyond the most current data, customers who
03:22
actually have to go into their s3 object store, most of them have had to come back out of the cloud, because the performance of the s3 in the cloud just doesn't allow them to complete the reports in the time frame that they need. So let's look at the new Splunk architecture, and then look at it on Pure Storage.
03:42
And let's see how it addresses these five challenges. So on this slide, I'd like to introduce you to SmartStore. This is the most current architecture from Splunk. I know this slide shows it to you on FlashBlade. But But please note here right now we're just talking about Splunk. Pure
04:02
Storage was a launch partner for SmartStore in 2019. And SmartStore is Splunk's cloud native architecture. This is what runs in Splunk cloud, it's available to you on prem, like cloud native. What I mean specifically, is that the compute side the indexers they're ephemeral. There is no
04:23
persistent storage connected to them, like you have with Splunk classic. So you are free to scale and manage the indexers without any impact of data gravity. It's a disaggregated architecture. All the stateless indexers are backed by an s3 object store, which is your persistent storage tier. And
04:48
best part about this is the flexibility. So your indexers can run bare metal, they can be virtual machines, they can be containers, a combination of those, let's say like if you have to burst and you want to throw away More compute at it right, you know, fire up some VMs, etc have them participate
05:04
in a sparse search event. And then you can deep provision them all dynamically on demand. Now your indexers do have a storage here, you see it's highlighted here on the slide, this is going to be high performance storage. I really am a big fan of the Intel optane
05:20
SSD here. But you know, customers may look at like an NVMe SSD as well, if maybe they can't afford octane octane because of the way that it can really paralyze, concurrent reads and writes, as well as its resiliency. The ability or the number of write cycles that it can take to me, I think, is
05:40
really a preferred media for your indexers. But I digress, you need to notice that the terminologies changed a little bit, we no longer have hot and hot buckets, everything that's are hot tears, warm tears, cold tears, what we have is we have cash, cash is what you used to consider like your hot here,
06:02
this is where data is processed. And then once it's processed, it's pushed down into the the warm tier, the warm tears your s3 object store. Now when data is pushed down to the warm tier, a copy of it will remain in the cache until it needs to get evicted due to either policies or capacity requirements. But
06:25
again, once it's processed, it's pushed down to the object store. Once it's there, that data persists. And it's the responsibility of the object store to protect that data provide data protection, availability of the the object store data resiliency, etc. Cold buckets have been eliminated in
06:43
the new architecture. And frozen buckets still exist. But that's an archival use of storage of your Splunk data. And we're not going to go into it here just due to time. Now, as I mentioned, SmartStore is disaggregated. And this makes the indexer server management much much simpler, you can
07:01
complete a software upgrade, and you can do so without having to evacuate or rehydrate the data. Because again, it's disaggregated the compute is ephemeral. In fact, there's so much pain around data evacuation rehydration, when you've got to complete a software update to either Splunk or the or the
07:19
Linux operating system underneath it or firmware to the server that some customers have just given up and trying to make sure the environment is up and running. And they'll just down the servers to complete the software update. And if any reports fail, they'll just rerun them. So that's a real big
07:33
problem within the market. Because SmartStore is disaggregated, it also lets the recovery of a failed ha indexer complete in a much smaller period of time, there's no need to to validate or reconstruct data. And the ability to scale compute or storage can be done on demand. And with no impact to
07:53
searches because there's no data rebalancing. I know I sound like a little bit of a broken record there. But these are keys that you really need to understand that if that affect Splunk classic, whether with direct attached storage San or NAS, they're all treated the same. And SmartStore is fundamentally
08:11
different. Now here's a couple proof points, I want to point out, this is from an eight node indexer cluster with a very small 30 terabyte data set that sits in our lab, we've got an all flash array, and we've got we've got a FlashBlade. So classic Splunk is testing on a flash array. What you can see on
08:28
the left hand slot side is the time it takes to recover from when we purposely failed an indexer. The time it takes to validate the data and and and fill in any missing data is two and a half hours of classic Splunk. With SmartStore. We skip all that process the server's backup on online and processing
08:47
new data in nine minutes. On the right hand side, very similar bars very larger time scale, we decided to say what happens when a customer needs more compute power. And let's add one more indexer to our eight node cluster. When we do that we trigger data rebalancing. As I mentioned on the previous slide,
09:06
what you see here is an all flash array, it took nearly 12 hours. Whereas with SmartStore, very similar to the HA recovery time, the indexer comes up, pulls its config file joins the cluster connects the s3 object store downloads the metadata and is serving data or processing data within 15 minutes. These
09:28
are fundamentally significant gains around the time that your staff has to take to complete this operation, babysit it monitor it, right potentially negatively impact reports that are running or maybe have you miss a data set, etc. So just a much more flexible and resilient architecture.
09:47
Now, with all good things, we should always try to figure out if there's trade offs. And there are trade offs in this new SmartStore architecture. I'm going to try to explain them to you here using very simple data. Graham's probably making every engineer in the room skin crawl. But the point of this very
10:06
simple diagram is to show you that I've got a Splunk server, that's the outline. Inside it, I'm running the Splunk application in the compute in the server's memory. And I've got storage, whether it's, again, direct attached Sander, NASS, it doesn't matter. But I've got high performance
10:24
storage, storing my hot in my warm Splunk buckets. And I've got my low performing low cost, right, my storage tiering, if you will, right, I've got another type of storage that's storing my cold buckets. The key that I want to highlight here is that it doesn't matter which tier it's coming from. But the
10:47
Splunk application can directly make direct read requests of the buckets and the data in the buckets. Regardless of the storage tier, it sits in that Splunk classic, I do have efficient IO, even if I have the other challenges that we stated earlier in the session. By contrast, when I'm looking at
11:09
SmartStore, you see that my indexer server still has my Splunk application running in memory. But the Splunk application can only read from the high performance cache. If the data is in the cache, you get very fast response times. If the data is not in the cache, then your performance can vary
11:28
greatly depending on the capabilities of the s3 object store. That's because any data not in cash must be downloaded from the remote object store. And so now you can start to understand or hopefully I would have been able to share with you is that with smart store, any data not in cache is going to
11:51
add additional IO, downloading from the s3 object store and writing to the cache, then being read from the cache into the application. And then taking whatever steps are necessary after that, that increase in IO, you need to compensate for somehow in your architectural design. So let's talk about s3
12:09
storage. s3 storage was designed introduced to the market with tremendous success as an archival tier. We talked about high performance storage and low performance storage for you know, storing hot versus cold buckets in Splunk. s3 was designed to be be be for frozen buckets, right for data that was
12:31
infrequently accessed, doesn't have performance SLA is around it. Right deep and cheap is the phrase that I think we all used. And it was tremendously successful, both in on prem storage offerings and offerings in the cloud. Pure Storage FlashBlade, introduced unified fast file and object. We made a
12:53
bet over seven years ago when we started designing FlashBlade that data analytics would adopt s3 technologies as it seemed this the only logical means to provide massive scaling. And that's come to fruition. We're talking about SmartStore on S3 today, but it's no different than everything else that's
13:15
happening in the analytics space. Look at elastic searchable snapshots provides S3 capabilities and architecture very similar, or vertical with Ian mode, reprise S3 on the back end, or Kafka S3, from confluent, right? All of these technologies are made to scale and are leveraging
13:35
s3 and Pure Storage is their only high performance storage partner. So what is flash play? Well, it's a scalar storage architecture for unstructured data. But the key elements that you need to know specific to our Splunk conversation is that each blade is actually a storage controller. So processes IO, as
13:59
well as it also has its own storage capacity. You can start with as few as seven blades. And you can non disruptively scale from seven blades up to 150. Now that's that's 10 chassis is connected together, each storing 15 blades. A chassis is four records in height, and can provide up to 15 gigabytes of
14:20
bandwidth per chassis. That's because every blade in a FlashBlade participates in the processing of data in Splunk. So you will always gain maximum maximum parallel IO, which is not the case when you look at alternative object stores where each bucket can only be delivered by a single
14:43
controller. So you get limited on your bandwidth by the number of controllers that you have, as well as then the ability for that controller to process that data and the storage media behind it. Before you chassis can store up to about a petabyte of data, usable data. That's it To one compression ratio, we get
15:01
about 1.5 million with Splunk. So you're gonna get a little bit less storage than then one petabyte per chassis. And of course, flash blade continues the tradition of Pure Storage to reinvent storage management by being very simple touchless. And something you can monitor and manage anywhere that you have an
15:21
internet connection, as well as we've got an app. So what's easier than that? Now, the flash blade architecture provides all this parallel i O, as well as a plethora of data services. And the two data services that we're going to touch base on, as we go through the rest of this conversation. And I kind of
15:43
touched on one here a moment ago, was data compression, right, we have the ability to compress Splunk data, which is already compressed. That's because we've got multiple compression algorithms or libraries within our array that allow us to apply the best compression based on the data
16:00
structure that's being stored. So we're going to be able to squeeze out about an additional 50% storage savings for you when you store that data on FlashBlade. But in addition, as you'll notice, as you kind of go to the right here, we're going to talk a little bit about safemode snapshots, how we
16:18
protect your Splunk data from cyber attacks. But before I get here, let's go back to performance. So we were talking about s3 being archival, deep and cheap and slow. And Pure Storage said, we got FlashBlade, we're going to bet all in on high performance s3. And what are the results? Well, here's
16:40
here's a direct comparison of a benchmark that was put out in public by one of our competitors, we we mimicked that testbed down to having the exact servers. And when we looked at the tests where search performance had to hit the s3 tier, FlashBlade was up to 80 times faster. Now, 80 times
17:02
faster, something that's hard to to cut to kind of grasp in your mind. So think of it this way. Remember, when you're doing analytics, and reporting, bandwidth matters, latency matters. But outlier latency is what you wait on. A report is never complete until it has all the data. So you can have 99% of
17:27
that data going very fast. And if 1% hold you back, then that's the time it takes to complete your report. So for example, 80 times faster translated into reporting times, if you had a report that took 40 minutes on slow s3, it'll take a half a minute with FlashBlade. That's what 80x faster means to you.
17:52
And why does performance matter? With Splunk? Well, depends on your industry. For some customers, they're going to just use Splunk to look at near term data, and everything else gets gets, you know, deleted. But you may be an environment that has GDPR regulatory requirements, right, like GDPR regulations,
18:09
Article 33. That states, you only have 72 hours between the time that a privacy breach has been identified, and you notify all that have been involved? Or say you're using Splunk storage to support your cyber cybersecurity. Did you know that the average data breach isn't detected until 206 days after
18:29
it's already occurred? What if you have to go back and relook at all those events? Do you have an infrastructure that will support your needs to meet these two use cases, and many, many others. Now, when we get in competitive situations, for Splunk business, we consistently run up against low cost, low
18:54
performing s3 platforms. And the initial response is that Pure might just be too expensive for what I need. And if cash is so fast, maybe I don't need high speed s3. Well, let me kind of give you a comparison of the tale the tape, because if you only look at the F the price of s3 storage, yes, it would seem
19:17
that Pure Storage is at a price disadvantage. However, you have to look at the full architecture to understand where your costs are going to reside. So first, let's start with s3. As I mentioned earlier, you're going to get about a 1.5 to one data compression. On top of the compression that Splunk already
19:37
puts on their data sets. This means you'll need less storage with RS three than with alternative s3 platforms that either don't have compression or have very lightweight compression that cannot impact spunks buckets. Next, when you go above the s3, you'll notice that our load balancer is built
19:58
into the flash array. It's part of the top rack fabric connections within the FlashBlade, that when you look at the alternatives, you actually have to build out load balancers and not just a load balancer multiple load balancers for high availability. This means more network ports, more
20:17
network devices Rackspace and power. When you actually look at the indexers themselves, Pure Storage and professional services can help right size your indexer cache. This means we're going to look at your custom reports, and figure out how much cash you need to support executing those those
20:36
reports in a timely fashion. By contrast, the SLO s three vendors will advocate that you build a very large cache, I have met with customers who have had storage vendors tell them that their indexer clac cache needs to support 612 1824 and 30 months of data. Now, what you need to understand here is when
21:03
you dig in is that indexer cache is going to be more expensive than that low cost s3. It's pretty because that indexer caches likely some type of high performance technology like an Intel optane SSD. In addition, that indexer cache has to store multiple copies of your data. It's storing whatever your
21:23
replication factor is, which must be at least two but maybe more, as well as all a cached copies of the data that has been pushed down to the s3 object store. So by there's a little bit of a bait and switches, if you will, if you buy low cost s3, don't worry, you'll pay for it with a large capacity of high
21:42
cost index or cash. We avoid that with FlashBlade in this architectural design with smart store. So ultimately, you end up with a more efficient Splunk infrastructure, both in terms of storage capacity requirements, storage costs, but also in terms of data center resources like Rackspace, power cooling
22:03
networking ports. I mentioned earlier that FlashBlade redefined storage simplicity. And I think this is key. And we're kind of coming down to the closing minutes here. But you know, smartstore brought a lot of simplicity involved involved. And what Pure
22:18
is known for is is taking the complexity of storage, and letting application administrators manage their entire infrastructure stack without having to be storage experts. And also to do so with consistent success. We do this through a number of means that you see here on the slide. But
22:34
the area or the feature that I really want to talk about is FlashBlade site safe mode. This is a cyber protection mechanism built into all of our storage platforms that protects your data against cyber attacks. We have system level immutable snapshots that are inaccessible via any public interface that
22:53
provide protection against cyber attacks. In fact, your your, when you log on as as administrator or root on a FlashBlade, you can't even see that that safe mode exists. So heaven forbid, your analytics platform gets hit by a cyber attack, your data gets encrypted or deleted, the snapshots on
23:13
your storage platform get deleted as well. So you're thinking I've got no backup. That's not the case with Pure Storage, a customer can call support, we're going to ping them back through a form of of multifactor authentication. And then we're gonna allow them to put back that file system to
23:29
last known good, good version, allow them to then go ahead and address whatever data that may or may not be affected by their cyber, cyber software or malware, and get back to processing data analytics. This is a really big differentiation for us at pure, and something that's helping us help customers
23:50
face modern cyber security threats. Now you can deploy flash play today with classic Splunk, you can use it for your warm and cold tier can't be the hot but can be warm and cold, you could get an immediate acceleration of your Splunk environment today. And then as you are ready to adopt
24:07
SmartStore, you probably gonna have to upgrade your servers, you can move right into a SmartStore configuration. So FlashBlade is future proof, it's investment proof. Nice little image here kind of showing you if you've used 20 servers today, with classic Splunk. As you go to 20 servers with with
24:23
SmartStore. Right, you can get smaller servers that needs less storage. In terms of footprint, this is just a cache. And you can see the efficiencies in terms of total storage with FlashBlade as you drop the replica, the need for replicas and you add the additional data compression. I've mentioned
24:41
before we've got pro services for you to ensure that your deployments get off on the right foot and can and can be successful so we can look at optimizing or doing deployments in new architectures. We also have both classic and SmartStore reference architectures. And I'd like to highlight on the right
24:57
hand side, the latest one from the Kinney Group which really goes into looking at virtualizing your indexers and eking out performance that exceeds that of bare metal, regret read if you're a Splunk architect. So, we've come to the end of our session, I'm right up against the time limit, but
25:15
Splunk up to your storage of Splunk. At enterprise scale, I really encourage you to look at smart store. And then when you do so, come back to what we discussed here. Pick one of our channel partners or our sales teams, and we can revisit the conversation that we had today about accelerating your search
25:29
performance, redefining your total cost of ownership and allowing you to gain simplicity and agility within your Splunk infrastructure. i'm Vaughn Stewart. Enjoy accelerate. Take care.
  • Video

For all of the gains SmartStore provides, search performance can still be a major issue. This often happens because the S3 object stores backing SmartStore backend have been designed for low-performance archival data. The result is that searches are slow or time out before completing. In this session Vaughn Stewart, VP, Technology: Partners & Alliances at Pure reviews best practices for designing a SmartStore architecture and the need for fast object if you want to avoid data I/O trade-offs.

Test Drive FlashBlade

No hardware, no setup, no cost—no problem. Experience a self-service instance of Pure1® to manage Pure FlashBlade™, the industry's most advanced solution delivering native scale-out file and object storage.

Continue Watching
We hope you found this preview valuable. To continue watching this video please provide your information below.
800-379-7873 +44 20 3870 2633 +43 720882474 +32 (0) 7 84 80 560 +33 9 75 18 86 78 +49 89 12089 253 +353 1 485 4307 +39 02 9475 9422 +31 (0) 20 201 49 65 +46-101 38 93 22 +45 2856 6610 +47 2195 4481 +351 210 006 108 +966112118066 +27 87551 7857 +34 51 889 8963 +41 31 52 80 624 +90 850 390 21 64 +971 4 5513176 +7 916 716 7308 +65 3158 0960 +603 2298 7123 +66 (0) 2624 0641 +84 43267 3630 +62 21235 84628 +852 3750 7835 +82 2 6001-3330 +886 2 8729 2111 +61 1800 983 289 +64 21 536 736 +55 11 2655-7370 +52 55 9171-1375 +56 2 2368-4581 +57 1 383-2387