Skip to Content
30:16 Webinar

Supercharging Next-gen Security Information and Event Management (SIEM) with Object Storage

FlashBlade® Object Store powers faster SIEM platforms like Splunk and Elastic with 10x query speed and improved cyberthreat detection.
This webinar first aired on 18 June 2025
The first 5 minute(s) of our recorded Webinars are open; however, if you are enjoying them, we’ll ask for a little information to finish watching.
Click to View Transcript
00:00
Well, good afternoon. Thank you for joining us, uh, on, on a Thursday after hopefully it was a good nightclub party at Zoc. uh, you know, I did not I did not bring my nightclub clothes, been a long time since I've been to a club, but it was, it was a lot of fun. Um, so today we're, uh,
00:19
Kartik and me, we're gonna talk about how you can leverage Flash Blade to really supercharge your next gen seams, right? And this is a, this is a topic that a lot of our customers, uh, are concerned about, um, seas how seams are evolving, um, and so we'll dive deep into this topic today, OK? Again, Yajmehta Kar Srinivasan.
00:46
So we'll kick it off by first kind of describing the sea market, right? Where, where it was and how it's evolving and what exactly seams are, right? Security information and event management, that's the acronym, that's what it stands for. So how does Gartner actually define a scene, right?
01:06
Um, they, they have a pretty lengthy definition on this, right? Enables threat detection, compliance, incident response, and they do that by collecting, analysing and managing logs, uh, log data and contextual events across diverse sources, right? It's a, um, you know, very good definition,
01:28
but in a really simple, simple way to look at it, teams are essentially collecting logs from your, uh, from your systems, from your users, from your network, and then leveraging those logs to provide insights to your sec ops teams, to your IT ops teams, OK? But scenes have not stayed, you know, uh, stayed the same since they first came on the
01:49
scene. Sees first started coming on the scene, uh, around early 2000s, OK, and at that time they were essentially just log collection, collect logs, and then do some threat correlation, but this correlation was essentially uh done through predefined set of rules. So your SEOps team already had to know what kind of threat to look for.
02:13
They would set up the rules and that's how seas would work. Not super scalable, but this was the first iteration of it. Starting in the mid 2000s, seems kind of evolved into this big data, uh, big data platforms. So we had uh solutions like Big Panda, if you guys have or McSoft,
02:35
right? And they were essentially, uh, how I like to think about them as like aggregators of SEMs, so they would sit on top of additional seams so you would have seams and then you would have these big dataEM platforms as well. Um, and what were they solving for? They were essentially solving for the problem of you have point seam solutions like a sumo logic or a splunk and getting
03:01
correlating data amongst all of them became super challenging and so these big data platforms came along and said, oh, we will correlate all of that data for you on a single, a single thing. But again, the outcome that they were driving towards was still. Visibility threat correlation. OK.
03:20
The other area that scene started evolving into was around cloud native threat detection. So detect doing threat detection on cloud native workloads as well. Makes sense, mid 2000s, a lot of cloud happening, clouds on the scene. But where are seams going now, right? And how have they evolved to from 2020 till now? So starting in 2020 what we started seeing is a huge pent up demand from customers and within
03:50
the market that hey scenes are great, they show us, they give us visibility, but I can't do uh any action. I need to respond to threats as well. OK, that's one problem, another problem. I, I want my team to also look at data from users from endpoint. In my network layer in addition to all the system logs that they're collecting as well,
04:17
so seams have started evolving and then now the industry term that's getting used is called NextGenEM. Well, what is a next gen scene? What are the core capabilities that define a next gen scene? So the first one that in NextGen CM does and uh other than what you know seas used to
04:40
do is they look at behavioural analytics, right? What are users doing? How are users accessing systems where they're logging in, what data they're accessing. And they use these analytics in order to take actions against a user threat,
04:58
so protecting the customer or providing the security, security ops team insights into where is the threat inside the four walls of your enterprise as opposed to just from the outside. The scalability that these seams now have to deal with has simply exploded. OK. We now have seams that are collecting logs are being, are collecting logs from all of your
05:26
network, all of your endpoints, thousands of endpoints. I mean, I'm sure many of you have a Crow strike agent installed on your laptop, right? I think the world found out last year in a pretty bad way um they're collecting uh um log data from endpoints from the network from systems uh from users, right? And so the amount of data that they need that
05:49
needs to get managed for these seas to function properly from a scale perspective has gone from a few terabytes, hundreds of terabytes to. Petabytes of data. We have a customer, um, very large, uh, tech company. They ingest up to 2 petabytes of log data in their scene.
06:10
It's backed by flash play. OK, that's every day. The scenes are also now providing what's known as real-time risk scoring, so companies now want to know in real time what is my security posture look like, right? What is my security posture look like across my
06:29
network, across my users, uh, across my devices, right? In order to do this, there's a lot of analysis, there's a lot of data crunching, a lot of threat hunting. That needs to happen, right? And so you can imagine the amount of data that needs to get correlated that needs to get parsed through becomes super relevant.
06:52
This is probably one of the biggest, biggest additions to a NextGen C. You would used to have back in, you know, mid 2000 you would have a SEM platform and you would have a sore platform security automation, uh security, uh, operations and automation response platform. I think most of you might be familiar. You had a splunk seam and you had a splunk sore,
07:13
right, two separate systems, right? Security doesn't want that anymore. It's like that leads that really increases the mean time to respond to an incident. They want both these systems to come together. What does that mean? So they want the the visibility,
07:31
the threat detection on the same system and as threats get detected an automated response needs to happen to the right people, right? On the right system, OK. The threat intelligence right from outside sources for things like a Mandan feed is now getting integrated more and more into this platform again. Oh, now I need to bring in uh uh this this
07:58
threat detection and these feeds back onto my scene. I need to store all that data as well. And lastly, these scenes are now are now implementing advanced threat detection with machine learning, right? They're leveraging ML they're leveraging AI in order to do this threat detection in order to automate a lot of the response as well,
08:22
right? So when you put all of these things together, right, customers are being forced to look at. How their existing scene platforms are functioning, what is the underlying infrastructure that supports them, and then relook at how they would change their compute layer, their network layer, obviously more and more importantly,
08:43
their storage layer, right? How does it impact the storage? They're now looking at fast performance but also capacity optimised storage in order to implement put down these next gen Z platforms. But great, these teams are there. Let's say uh and we have existing these teams are running on on existing infrastructure and
09:09
what are the challenges that SECOps teams are facing with these next gen teams. SEO's teams have a business outcome. The one outcome that they want to drive towards is reduce mean time to detection. They get gold on that, right? They want to reduce the mean time to detection, but it becomes challenging if the volume of data that you're ingesting is exploding,
09:33
but your infrastructure can't keep up with it, OK, so they cannot meet that goal. They're being asked to do orchestrated incident response within a 30 minute breakout window. What is a breakout window? right? Breakout window is essentially the time it takes for a threat actor to come in and then maliciously move from one system to another system to spread the attack,
09:57
right? This breakout window used to be ours. Right now it's, right? 30 minutes. Within 30 minutes if a threat actor decides that they want to move laterally to other systems, they can do it. And so your Sekov's team has to keep up with that when they have to keep up with that,
10:16
if the storage system or the underlying infrastructure cannot keep up with the with those demands, it leads to increased query times, leads to increased query times, they cannot orchestrate this recovery, OK. And lastly, Security security is now being asked to retain these logs for even a longer period of time. We have regulations in EU called Dora.
10:42
We have regulations here in the US, S2, they are asking security teams and from their compliance departments to keep these logs longer and longer from 30 days to 60 days, sometimes even to 90 days. Why does that matter? That matters because that data, the retained data is then gets used to do advanced threat hunting.
11:05
Threat hunting is you're proactively going out and looking for threats, not reacting to threats, but when you have to store this data for a long time, you need to store it on a performance and also cost optimised storage. Yeah. So what does Pure offer right in terms of uh in terms of NextGen seas.
11:28
So Pure has had a solution uh with Slunk and Elastic, I think since 202021, 2020, right? It's publicly available. We have a lot of solution briefs, a lot of content out there. We actually love uh use Blanc Enterprisecurity. We dog food it in our own environment.
11:49
OK, we've gone to Splunkoff and spoken about that as well, and we leveraged Flash Blade in order to support that environment. Very recently, well, just yesterday, we made the announcement, we are actually adding support for CrowdStrike log scale as well. So we work collaboratively with CrowdStrike, and Pure Storage is the first and only
12:12
on-prem storage, uh, uh vendor that CrowdStrike has validated for log scale on-prem deployment. OK. No other storage vendor has, has, has gone through the validation with CrowdStrike. OK. What is this CrowdStrike log scale validation? What have we done, right? So CrowdStrike log scale is a community,
12:37
a community app, right? It's a cloud native app. Um, and we have, uh, we have done the validation of full S3 API compatibility. So we get seamless integration between log scales, log life cycle workflow. So the logs would come in from the, uh, what you see on this side of the, the side of the slide from the sources of uh query requests or UIAPI request,
13:02
uh, any kind of requests that would come in, it would go to first go to the um to the log scale operator or the log scale app deployment. The logs actually come in from the sources of endpoints and users into the Kafka and the zookeeper streams. Once those logs have gone through the log scale deployment, the log scale compute layer,
13:24
they would then get stored on the S3 object storage on flash blade. OK. We've done testing and validated that we can ingest up to 110 terabytes of log ingest per hour, right? Uh, this requires high performance ingestion without taxing the compute layer and and even the storage layer. We've done testing of rehydration,
13:47
so in case the log scale, your log scale environment fails, it goes down, and then you have to rehydrate it back from the retained logs on on flash blade. We have validated that you can rehydrate that up to 110 terabytes of log scale. And we have tested uh advanced threat hunting across this entire environment. So we've worked with log scale to validate that their customers can do threat hunting,
14:16
leveraging the logs that are being stored on F flash blade. We are continuing to work with them. We will be integrating with CrowdStrike's NextGen C platform as well, their cloud-based Falcon platform. So any all the logs on that reside on a flash array or a flash blade, they can go to Falcon and YEOps team will be able to threat detection on the storage layer
14:41
directly as well. OK. That integration will be going live in a month and so on the CrowdStrike marketplace, you will see a pure storage app available for you to uh start using it. All right. With that, I'm gonna hand it off to Kartik, who, who will go into the details of how actually Flash Blade makes this happen,
15:03
makes the magic happen to support these next gen Z platforms. Thanks Raj. Uh, great to be talking to you guys. What I'll do is, uh, before we go into talking about object store and how it actually helps power your next gen SEM, let's first truly understand what problem we're trying to solve here. So if you take a look at, uh, you know,
15:25
traditional mechanisms, right, you know, SEM platforms have evolved, you know, I'll I'll take Splunk as an example, and then it applies similar to Elastic and other platforms as well. They all started out as software defined capabilities using commodity servers, uh, and what they will do is they will take, uh, software and then they'll they'll dump it on a
15:41
bunch of servers and within the servers themselves, uh, with, with direct attached storage, right? You know that so that was the first iteration of the architecture. Uh, it worked great for very, very small scale systems, uh, but as the volume of data started increasing. And you know, Yuvraj was talking about the kind of endpoints,
15:58
you know, you know, millions and millions of log entities, you know, uh, you know, systems, thousands and thousands of systems, millions and millions of logs, uh, you know, these, these architectures, uh, started having challenges. One is the complexity of managing them at scale, uh, become, you know, becomes prohibitive, uh, you know,
16:14
cost prohibitive, you know, operational aspects of it. Uh, but architecturally also when you want to increase performance you gotta add additional storage nodes, which means that you got to add additional, uh, capacity to these to these nodes as well, even though you may not need it in the first place. Second is because the software defined architectures what they do is they assume that
16:32
the storage is gonna fail, so they're gonna have to build HA into the environment, which means that they got to replicate the data across multiple environments. Now the the software programme that is really trying to serve your, uh, you know, security analytics information is also doing storage management, right? So if a server goes down, you got a rebalance,
16:52
so there's a lot of complexity in that and then. Performance dips, right? And so there you get unpredictable performance because of that and then over time cost becomes very prohibitive because of these things, right? uh, you know, and, uh, you know, the same with the if a server goes down you'd have to do a bunch of,
17:08
uh, similar things as well, right? Um, so what happened was many of these, uh, architectures then evolved to a desegregated architecture. This is seen as the best way to do it. You saw the same thing with log scale where a bunch of these pods and worker nodes are there, but then they, what they have is they have a localised cache,
17:25
especially in the case of Splunk. I'll use that as an example. They'll have a localised cache in these what they call indexers, and then they will offload all the, the older data into a warm archive or a cold storage depending on how you describe it, right? um. So this simplifies the architecture you're able to scale the compute elements of it independent
17:44
of the storage you can keep that light enough so that even if something goes down, it effectively becomes slightly more towards stateless in the sense that if even if a server goes down, uh, you know, and if another server gets added. Up it's a lot faster yeah they have access to the same data set uh and you're able to manage the TCO really really well you have better predictable performance and your operational
18:07
complexity, uh, becomes lower and lower, right? Um, now, but there is a catch to this. What happens is even when you have this kind of desegregated architecture what these systems do is to do any analysis to do any search any queries what they do is they download the data from the, you know, object storage system into the cache and then do the analysis uh they
18:33
don't directly run queries on the on the on the object storage system. Now this, uh, you know what, uh, what Ya was mentioning that hey if you wanna do, uh, threat detection and more advanced threat detection, be more proactive you effectively have to look at the large volume of data that you've already captured, which means that if you're gonna have a cache miss, you're gonna have to reach out to the
18:58
object store to pull all the data down, right? And if you have, uh, you know, if you want to do, you know, um, uh, you have multiple users accessing the same system running queries, it's the same thing, right? You're gonna have cache misses and then you're gonna have to download these things.
19:12
And if your uh storage uh environment is very slow, uh, you're gonna have a lot of latency associated with it so it's not enough if you just take the data and you just offload it to a storage system, but the way you design it also matters like the storage system behind that matters a lot in terms of your, uh, you know, meantime to threat detection pretty much,
19:33
right? So What kind of things go into designing such a storage? Clearly you need scale because the volume of data is increasing significantly, but it comes with a catch. You don't want storage sprawl. I mean you are trying to solve for servers sprawl by adding storage at the back end,
19:51
but you don't want storage sprawl, right, spread across multiple racks that becomes an, right? So you need density of storage. The second thing is throughput of course matters a lot because you're downloading a bunch of data into your environment or you're pushing a lot of data and you want to do that fast. You want to have very less overhead associated
20:08
with those caches, right? And then of course it goes without saying that you need redundancy. Flash Blade was really built as a unified fast file and object store right from the get go on all flash architecture. We have built this on top of a distributed key value database store,
20:24
you know, it's massively parallel. It can scale today up to. Uh, from 60 petabytes to even 120 petabytes depending on the platform you choose, uh, and it was built purpose built for many, many large scale analytics applications including SIEM, and we use it for, uh, you know, AI, data analytics, log analytics and everything in
20:44
between. So, and many of these architectures have chosen to use object store as the as the backing store for this, right? They can they can choose to use file as well, but we see many of these architectures effectively moving to object, and the reasons are many. One is basically they wanna be cloud ready
21:02
because they can run it on the cloud and object has, you know, portability across on prem and cloud. It's the same API. So this is a, uh, a great way to kind of like become being cloud ready and, and hybridise your architecture. The second and most important thing is likely is because of unlimited scale,
21:18
uh, and, and you know the object, uh, environment is, uh, necessarily a flat name space you don't have directories contention for writing like you don't have I notes and so on like in traditional file systems, uh, and so, uh, you have a lot, you can have a lot of concurrent access for rights, uh, to the same object, right?
21:37
Uh, and scale is, you know, it can scale up to billions and billions of objects. We have customers who run object stores with 100 billion plus objects in a single bucket, right? Performance doesn't dip. There's no issue at all, as they, they do both regions and rights. Um, flash blade has been built for scale and very high performance.
21:54
We'll, we'll talk a little bit about that more. Uh, we also do a lot of storage efficiency, right? You know, you want your storage to be efficient for us to pack everything into a very dense environment as you might have seen. Uh, you know, not only is our large scale DFMs enables us,
22:08
but we also do erasure coding. Uh, we have security, we have encryption turned on, so it's a very secure platform today, uh, and if you're setting up a, uh, uh, an enterprise grade, uh, SEM environment, uh, you typically want to have some sort of fail or fail back, right? You know, you have a DR environment, so you
22:25
know for that you need some sort of active, active, uh, topology which enables that, that, that kind of environment and we support that as natively as part of flash play. So from a scale standpoint, right, we can, we have 3 or 4 different variants. I've just picked the most important ones.
22:42
The S500 is the S class, you know, in terms of high performance can go from 350, 340 gigs a second of read, which enables you to kind of have your sea pull the data much, much faster, uh, to almost about 500 gigs with the SR2, which is the next generation platform. The E on the other end was built for large scale archive systems,
23:03
but it's way better than even disk-based systems. We have a lot of customers who replace their seaM environment with traditional disk-based object storage systems with either one of these three options, right? And, and depending on their need. Performance and how quickly they want and what their SLAs are,
23:19
uh, you know, the performance, you know, we, we recommend the right performance, right? So the archive systems, they go up to 60 petabytes by the end of this year they'll be 120 petabytes single name space you could have one bucket, 120 petabytes, no issues, uh, 48 gigs of, uh, read, uh, it was 12 gigs of right, but it's now become 20 gigs of right.
23:39
So incredibly highly performance systems, very large scale systems, great for analytics and, and all of the things that you spoke about, right? You not only want to be very quick and reduce your mean time to threat detection but also want to be proactive in in finding out doing threat analytics and seeing how, how you can be proactive.
23:59
So where does flash blade get its performance, right? Like I said, it's built on a massively parallel key value, uh, uh, you know, transactional database, uh, you know, that's all homegrown. Uh, we have, you know, it's built on top of a lock structured metadata engine.
24:14
And core to it is our direct flash technology, right? Um, so ours is a bladed architecture, which means that you can just add blades to increase the amount of storage, and the storage does only that, right? the, the, the compute within the storage environment only serves the storage portion of it, right?
24:29
We're not. Doing other things and because it's massively parallel you can actually uh scale out uh this environment uh all the load is distributed across the blades and all of them participate in the IO needed for serving the data to the same environment, right? Uh, he was talking about 110 terabytes an hour for log scale.
24:49
That's massive, right? You know, uh, so think about how much data gets ingested, uh, for you to put a process and then how, how, how quickly you can recover in case there is a problem in the log scale one, right? So optimised for performance, small objects, large objects we treat them equally well.
25:06
We have the architecture is very optimised to serve both equally well, uh, so you don't see performance degradation because you're using small objects, right? Uh, and then of course the direct flash technology along with Purity and our DFM, uh, exposes all the NAN to parallel access which means that your IO stack is incredibly optimised from the protocol all the way down to
25:26
the NAN. OK, so that's where we get the throughput, uh, that that is necessary for these next gen se environments. Uh, just a little bit double click into the architecture itself, uh, you know, we have a, you know, because it's a distributed, uh, environment we do two phase commits this increases your resiliency.
25:43
Uh, we are like I said, we are highly optimised, uh, for rights, um, our systems are built in such a way that, you know, I don't know if you've heard other talks today, but the longevity of the platform is very high. Our systems don't fail enough, uh, you know, uh, quickly enough, which means that your the life of your, uh, device environments is very,
26:04
very large, right? And the way we do that is by optimising our rights to flash. Flash has a lifetime associated with it, and if you don't manage that process very well, you'll wear wear out, uh, flash very quickly, and we have optimised this technology over. Last 15 years and our devices don't fail often enough and so we're able to offer larger scale
26:26
uh you know uh more uh let's say instead of 3 years where you have to do a refresh cycle it could be 7 years or 6 years and things like that which means your overall TCO actually improves, right? It it significantly reduces compared to other, uh, competing technologies. You also are incredibly power efficient you are.
26:45
150 terabyte DFMs, 300 terabyte DFMs, they draw almost the same power, so it's very power dense, uh, and it, uh, it allows you to actually take away racks and racks of space and then package into one or two racks. I'll give you a parallel example, not necessarily. Related to SIEM, we worked with another customer just down,
27:05
you know, they were, uh, earlier today they were doing a flash flash talk on cloud DVR. They had like, uh, I think north of 30 racks and they brought it down to 3 racks of almost 100+ petabytes of, of storage, um, so all of these things are powered by the architecture and the density that we have. So density, performance, scale, very important for us.
27:27
Uh, but the last one, especially as it relates to the same environments, is the ability to list, uh, things in the bucket, right? So if you want to have to search for a lot of things you need to be able to list very quickly and so our list performance is very, very optimised, right, because you're like looking for a needle in a haystack and you have billions and billions of
27:45
objects, you wanna be able to maybe somehow shrink that window quickly if you wanna like just hydrate all of this so we allow those kind of capabilities also, you know, and it's incredibly, uh, efficient that way. Um, a public example Veritas uses us for their splunk environments. Uh, they actually, uh, you know, their performance was horrendous.
28:05
It was very unpredictable. They were paying a lot of OpEx costs, so they actually reduced it down from, you know, they're paying like $2 million in OpEx costs. They had a CapEx purchase of 350K, eliminated all of that. The servers, they reduced it by 30%, so they say saved on a bunch of operational overhead as well, uh, and they had incredibly. very predictable performance associated with
28:28
this. This is a public reference so we can quote the logo and so on, um, but we have other reference we have large telco environments where we actually, uh, use the same architecture whether it's elastic or splunk who do this for like log analytics we can't publicly name them, but those, uh, environments that also follow the same pattern.
28:46
They started with commodity servers decided it was too complex at the scale that they operate, disaggregated it, you know, using partner technologies. Like strong smart store, uh, and, and elastics, uh, you know, search capabilities, and then they brought in flash blade to serve their sea environments and log analytics environments, right? So in a nutshell that's all I had for today.
29:08
This is how our platform is architected purpose built for these kind of high, uh, scale high throughput log analytics and, uh, environments, and that's how flashblade, uh, supercharges these team environments. We all of you in the um. In the general session yesterday, I believe, yeah, so I think one of the customers that spoke at the general session is FiServe, uh, they are,
29:33
they completely revamped their entire splunk environment with with flashblate and that is, uh, I think they brought down their, they were leveraging double ACS. I think they had like 12 more than 20 racks. We brought that down to like just 66 racks. Huge, huge bunk environment, help them improve their query times,
29:56
which used to be more than a minute. Now it's like 30 seconds. So public referenceable customers, customers who are using this, uh, this approach, the solution, even Pure is using this as well for this punk environment. Um, yeah.
  • Security & Compliance
  • FlashBlade
  • Splunk
  • Elastic
  • Pure//Accelerate
Pure Accelerate 2025 On-demand Sessions
Pure Accelerate 2025 On-demand Sessions
PURE//ACCELERATE® 2025

Stay inspired with on-demand sessions.

Get inspired, learn from innovators, and level up your skills for data success.
09/2025
Pure Storage FlashArray//X: Mission-critical Performance
Pack more IOPS, ultra consistent latency, and greater scale into a smaller footprint for your mission-critical workloads with Pure Storage®️ FlashArray//X™️.
Data Sheet
4 pages
Continue Watching
We hope you found this preview valuable. To continue watching this video please provide your information below.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.