Skip to Content
38:46 Webinar

Optimize Performance and Cost for Log Analytics and Search with Elasticsearch

Enterprises can generate application, infrastructure monitoring, and security log data at up to petabyte scale every day. This massive storage requirement poses serious cost pressures.
This webinar first aired on June 14, 2023
Click to View Transcript
01:00
We'll get started. Hi everyone. Welcome to the session, optimizing performance and cost for log analytics and search with elastic on pure storage. Now, some of you may be system architects, system administrators, infrastructure administrators, and you may be looking to do design and deployment of elastic
01:21
on pure storage or you may be having an existing elastic deployment and you may be thinking about expanding them. I am Chanda Dani and I am here with my colleague Jan AOP. And together we will talk to you about deploying a modern elastic search on pure storage, the full architecture we will talk to you about.
01:47
All right, this is, yeah. So we will talk to you about different data tiers that elastic has an ideal solution from pure storage corresponding to each tier. We will talk to you about an architecture where you can have effective cost and performance optimization. And we will also share with you results of our own test and validation and what we have
02:12
observed internally in terms of both linear scalability as well as query performance for hot queries as well as ad hoc queries. All right. So you may be using elastic for many different scenarios. The most common one is you may be using elastic as part of an application for doing search and indexing.
02:33
This is one of the very common use cases where you may be using elastic to search across millions of documents. Here, you want the results within a fraction of a second. And some of you may be using elastic for the logging use case as they call it. And this could be to do log analysis for infrastructure logs, security logs, different kind of application,
02:54
performance, monitoring logs or you may be using bringing in time series metrics data inside elastic right now, talking about infrastructure logs, for example, and application performance monitoring logs, infrastructure, all of you want to keep it on 24 7. And here we are talking about logs that, that you're getting while you are monitoring servers,
03:16
storage network, firewalls, all of that stuff now because of the peak in demand and all of of many different reasons as well, things go wrong all that time and many at times customers such as you rely on search and log technology to quickly trace issues and find the root cause of problems that you may be having. Now in these scenarios, on the right hand side, your storage needs range from high performance to balancing cost
03:45
and performance and going towards capacity as well. Now because of the unprecedented growth in modern applications, Iot des micro services, there has also been a huge growth in infrastructure, which has led to a lot of growth in this machine generated log files as well. So it is not uncommon for us to hear things like my e commerce application is now searching
04:14
across millions of documents and I need search results in milliseconds. For example, you know, these are very common scenarios. If you go to home depot dot com, the search that is underneath is elastic and it is actually doing this. Some of you say you are bringing hundreds of terabytes of log files into elastic and you
04:32
will see it actually becoming bigger. And these are large enterprises, a terabyte or two is like almost everybody today. And those of you who are actually using elastic for security logs. You know this scenario, malware can be within your firewalls for weeks and therefore you want to retain your security logs for longer duration if possible.
04:54
Despite whatever change is happening, what stays constant is the fact that yes data is growing. But with this growth in data, you always have to provide a very fast query performance, very fast search results, which in terms of storage means very high storage performance. Besides all these reasons, you are looking at fast time to insights for other reasons too.
05:19
Those of you who have GDPR responsibilities, you know, you have only 70 to 72 hours. By the time you detected a breach to the time you are informing your users, we were just talking about data breach. By the time you really know you have a data breach, you know, a lot of time has already passed. So now that you know,
05:39
it, you have to act very, very fast and many of you have here compliance requirements, those on the government side have regulatory requirements and all these requirements now put the pressure on you to actually provide very high search results, which means performance once again. So you are bringing you know data from our log files from distributed infrastructure
06:02
sources into elastic as a central repository to do log analysis. Now we all just talked about you're bringing more data, you're bringing more log files, your log files are becoming larger, you are not retain them longer. All this just means more infrastructure and growth in infrastructure and data is leading to operational challenges.
06:24
No surprise for you. Now you start dealing with things like unpredictable search performance, unpredictable search time, underutilized resources and you know your user base is expanding. So as you try to grapple this problem and you try to solve this, you find I'm spending a lot of money and then you will ask the question,
06:46
what about cloud? And yes, you can use elastic on the cloud. And when you are using elastic on the cloud, it is elastic cloud running on either Azure Aws or GCP. So a few things to keep in mind. So when you are using elastic cloud, you pay elastic for the resources consumed to run elastic.
07:08
And for the number of days, you keep your data on elastic. So you are paying for aws margins because finally, those resources belong to somebody, right? And elastic has to pay them. So you're paying for those cloud provider margins and you are paying elastic to manage your elastic cluster.
07:28
So you are paying for elastics margin too. So aws margins, elastic margins, it becomes an expensive proposition eventually, right? So not all organizations can utilize the elastic cloud offering cost being a very big reason. And then if you talk to the application owner that overarching constraint,
07:52
that the application is there to bring in some revenue or internal processes, right? The cost of running the application cannot be more than the revenue that it is bringing your profits are actually dependent on that cost of running the application being really low. And here therefore, this entire presentation and how we can help you actually give your users the best of
08:14
performance while keeping the costs low. So uh we talked about rising cloud cloud cost data data residency. Another big reason why people choose to be on prem and many of you have regulatory and compliance reasons too. So now now that we are on prem, how do we get the most performance in the most cost
08:36
effective way? So the first thing, first thing that you do is you use elastic data tiers. This is a recommendation from elastic as well. In fact, the picture that you are seeing here is from elastic web pages. So before we go into the details of the picture, why data tiers a very interesting way to think is that not all data is same for certain kind of data,
09:03
the amount of search that happens on that data declines with time logs, metrics, traces transactions are all those examples of data where the user interest in that data reduces as time goes by. So if you use data tiers, you now have a choice of using the hot, warm, cold or frozen tier and balance your storage cost,
09:30
which eventually becomes a big part of the cost. You balance the storage cost with the performance for your specific use case. So let us take a deeper dive. Let us start from the left hand side. So hot tier data for which or data which is being frequently accessed and data for which you need query query performance in milliseconds.
09:53
That is the data you want to keep there and you want to keep it on your, you know, high performing expensive storage data that is not access access frequently and data for which you can live with the query performance of let's say a second or 1 to 2 minutes. Those are the kind of data that you can keep on less expensive warm, cold or frozen storage tiers.
10:20
Now four tiers that is not easy to manage like you guys do this for a living, you know, the complexity gets really high. So not all customers use all data tiers. In fact, what we have observed is most commonly, customers will use hot tier, warm tier or frozen tier or there are many who just use the hot tier and frozen tier
10:43
and there are some who will just stay with heart or one. People are generally like people prefer to go be with two where in some scenarios where the deployment size is huge and it's a big part of their cost, they go on three. So uh so finally, you are using data to bring your infrastructure cost down and I'll explain how but what happens if you don't,
11:08
I will be only with hot air. Let's say you make that choice. Then the dotted line is showing you how your costs will grow. So you have, you are left with no choice but to bring some complexity and bring that to bring that cost down. Because now, you know, your data is growing and that hot tier actually not just from storage perspective, from compute perspective as well
11:31
is is very expensive. Um 22 things come to mind as you are hearing me out first is, you know, like, you know, as you're trying to scale, you're dealing with underutilized resources. And you say, how do I avoid dealing with this underutilized resources? So you need a storage solution that where
11:51
computing where compute and storage scale independently. Yes. Pure gives you that. The next question is I'll figure out which tier I want to use. But pure. Do you give me a solution? A storage solution that is ideal and suited to each, each of these storage tiers?
12:08
And the answer to that is yes. And we will talk about Jan, we will talk about that too. All right. So the next thing I want you to keep in mind, we talked about data tiers. But let's say now that you have started to use data tiers. The next thing is searchable snapshot,
12:25
searchable snapshot. Once again is an elastic capability that is available only if you are using gold or frozen data tier. So the moment you hear cold or frozen tea, what comes to mind is data that is not accessed frequently, got that data that is read only data that you know where I can live with search queries that may be let's say a one second, two seconds or up to a minute.
12:52
What what is unique about searchable snapshot is here, data is searched in snapshot that is stored in low cost object storage. Now this is where this is something which is not known very well in the market. When I'm saying object storage or when I say low cost S3 object storage. Many times people just think Amazon S3 is actually a protocol and it is not well
13:20
understood that the whole benefit is now available on prem as well, and this is where the magic of cost savings actually begin. So how does this magic of cost saving happen? Like what is unique about coal or frozen tier or or the whole searchable snapshot capability? So first of all, it eliminates the need of replica shards.
13:44
Forget about all that. That is database terminology from a storage perspective. What it means that elastic is keeping only one copy of the data that you have put in coal or frozen tier. And they rely on the sl A that is provided by storage vendors for high availability. Otherwise elastic keeps two copy of the data that is in elastic for high availability
14:10
reasons. But now that you are putting so much data there and if it's keeping two copies, your costs are really high. So the data that you have put in cold or frozen tier for that, they are keeping one copy, your storage cost for that data immediately becomes half one way to save cost, right?
14:32
The second way or the or the second thing that becomes implied is that everything that is not in the hot tier is now in this, you know, either warm co or frozen tier. It is your hot tiers, which is for which you are really paying for both computer and storage, they are very expensive. So now that it's not in the hot tier, you are saving a ton of money that you would have paid otherwise for your hot hot or warm
15:00
notes as well. So it is strongly recommended that you use searchable snapshots. Elastic offers this capability of index life cycle management where they they say that you can move data between, between these different tiers. But I also want to leave you with one thought that you know,
15:20
and we will talk about that shortly that if you are using, let's say flash blade S for your hot tier and flash blade, E for your frozen tier pure will allow data data, you know, for you to move your data between flash blade S and flash blade, E so you don't have to worry. Oh my God, I moved it to frozen,
15:40
but my team really wants it back. What do I do? You know you have that mechanism? All right. So Pr gives you and John will talk to you about this more pure, gives you a choice or we have a portfolio from which you can choose for all these tiers. So beginning from the left hand side,
15:59
whether you have a Coti install and we find that almost anybody who is starting on elastic new, they are all starting on Coti. So whether you but we support both. So whether you are on COTI install or we have installed, you know, pr will will support both. Many of you are aware port works is our solution, data management solution for COTI deployments.
16:21
It works for elastic as well. So starting from that for your indexing and search, you know, indexing storage needs. You can use Flash blade, which supports the protocol. Many of us actually many of our customers actually do use us with that protocol. And we also have flash array, which is our block storage solution. As you're aware,
16:43
moving on for index backups, flash blade, once again, you have a choice to use NFS or S3 protocol whichever you know, suits your use case best. And for searchable snapshot here is the flash blade E magic where you can all the data for which you can live with a little longer query time. That is what can sit here for flash blady. And yes,
17:05
we support the S3 protocol protocol here. So before I hand it off to Jan so that she can walk you through the architecture and share our own results. I want to leave you with three thoughts first. Yes, you do want a solution that is easy to deploy. But what you also want is all these non disruptive upgrades because you can't bring it
17:28
down. It's like your web page. It can't go down that app that is generating revenue for you can't go down the middle part, you know cost. We all know how important it is for you. The cost of the application running, the application cannot be more than the revenue that it is bringing. So cost is top of mind for you.
17:47
Our subscription services model gives you that and and the whole evergreen model. The magic of that is you are not ever have to think about all the innovation that pure will do both for the series on the hard side and for all your core data for with flash blady, whatever innovation happens in software hardware, it reaches you automatically. And finally, yes, a full portfolio that can give you the flexibility to balance performance
18:18
and cost and compute and storage scales independently. Now, there is one more thing I want to mention many of you feel I don't, I may have a lot of data in the future. But today, I don't want to start in a very big way. And I have, let's say only X amount of data that I want to bring in the frozen tier.
18:38
How small can I start? So flash blade e you can consume it using our Flex program and you can start as small as 1.5 terabytes. So plan for your usage that you may have in the next two or three years and start, you know, start, you can start at 1.5 terabytes and then you know, increase as, as your usage increases. So with that,
19:02
I'll now have Jan walk you through the whole architecture and our own test results. Thank you, Chanda. Um And hello to everybody. I whoops wrong with guess if you hold it the right direction, it helps. So I want to share with you some of what we had planned for and some of what we didn't plan for when we decided we wanted to test.
19:29
So Chanda has already shared a lot of really good reasons for you to think about pure if you are looking to deploy your elastic search architecture or infrastructure on prem. And again, we, we had planned for some time to test searchable snapshots. And before we actually were able to get that started, it was shared with us because searchable snapshots are a part of index or life cycle management that a lot of people
20:00
don't deploy it because they think it's difficult and time consuming to deploy. And so we realized we had an opportunity to extend our testing approach to see if we couldn't help customers like yourselves to simplify that. And so we leveraged elastic search data streams. These are designed to store a pen only data in several indices that are
20:32
behind a single named resource, a single name space. We're gonna talk more about those later ILM effectively. Again, index life cycle management gives you the consumer, you the customer the opportunity to identify how long you would like data to live in that hot tier. I'll refer to it as cash probably throughout
20:58
different parts of the conversation before you want it to move. That decision is up to you. And oftentimes it's based on the age of the data, the expectation is this is not um requested as often as the more recent data and we use searchable snapshots, Chanda talked about this, right. And we chose to deploy this in the frozen tier
21:22
frankly because of the behavior of the cold. But this is a a feature that is licensed by elastic that again allows you to offload that hotter. As Chanda already mentioned, the immediate benefit is you have a single chart of the data. There's no need for replicas and theoretically if performance is good enough.
21:49
And from that searchable snapshot tier, can you potentially reduce what what is required at the hot tier maybe? So it depends on your business needs. And then of course, flash blade with our object protocol in S3, this is our scale up platform that delivers S3 natively as well as file services. So this is a snapshot of what the test environment look like and you see rally at the
22:19
very top, I'll spend a little, I'll spend more time on this later, but I just wanted to capture this so that you had a really good understanding of what we were testing and what we were trying to accomplish data streams are a critical piece of the solution that we were able to put forward. And so I want to take some time to make sure that everybody understands exactly what they do
22:47
as the slide says, right, these are hidden, automatically generated backing indices under a single name space. So what that means when these are created is that effectively and captured on the slide from left to right. In this example, you see dot DS logs 2009 and you see 12 and three.
23:13
What happens is when that first index is created and as data is ingested and written to these, when they fill up data streams will automatically create a second index. But this is transparent to you the end user when the second one fills up another one is created and so on. So as you're ingesting new data, and I already talked about the idea that as they fill up a
23:46
new one is created, those rights are targeted for the most recent index. Now, you might be asking yourself well, how long do these indices live? How long do I have to hold on to them? That's entirely up to you. And again, based on the requirements of your organization because you can decide, you know, based on age, for example, when they are going to expire in our
24:16
testing data streams were combined with index life cycle management where the two elements that took the difficulty out of the equation that enable customers like yourself to leverage tiered storage and to make it an equitable solution. We've talked about ingest, let's talk about search because we have a number of
24:46
indices that are storing this data when a search request can't be fulfilled from cash it. The search request is made of the from the searchable snapshot or the S3 repository and that search is executed in parallel across all indices. This will speed the time that it excuse me, reduce the time that it takes to return that
25:15
query result. Let me share with you what our test set up looked like because there's some things I want to make a point of here I mentioned earlier that we leveraged rally, you may already be aware of rally. Um And if you are, you'll understand everything I'm about to go through.
25:38
But Rally effectively is a tool that was developed by elastic for benchmark testing. It's pretty broad in terms of the configurations and features it can support. In fact, elastic actually uses Rally for their own benchmark testing. The rally server as well as all of the elastic nodes were all deployed on virtual machines on a flash array, flash array is obviously backed by
26:10
NVME. And then of course, as data moves from that hot tier to the snapshot repository leveraging the ILM policies target of which is a flash blade. Here's some detail if you're interested about the different versions of code, how much memory was assigned to the virtual machines, etcetera,
26:37
right from a rally perspective and an elastic search perspective and some of the the parameters that were established. Here. Here's the interesting thing. So when we tested this, we tested this on a flash blade S we have two versions of our flash blade S, we have a 205 100 the 500 is targeted at more demanding
27:02
workloads, highest performance. We used an S 200 with a single chassis. We went with near minimum config with seven blades and two direct flash modules on each blade. Again, very close to what I would consider an entry level system because our goal wasn't necessarily to see exactly how fast we could make this go because customers don't
27:28
necessarily plan for that, expect it. And you want to keep that in mind as we progress through. And so again, if you, if you're familiar with Rally, this might make some sense to you. It's Rally is a little bit complex. Maybe it's a good way to say it. But we leveraged Rally and our custom tracks, our test tracks were based on New York City
27:53
taxi data set, which I imagine folks are probably familiar with. It's quite often used for testing and a custom track was created for testing using what you see here. So we first created a mapping template, combined the templates that were created into a compos, index, address, data streams and aliases.
28:19
And the most important thing that I want people to understand is when we configured our ILM policy, we were pretty aggressive in terms of what we configure. We didn't have the volume of data per se, but we wanted to see this turn over and turn over pretty quickly because we had a pretty good idea how this was going to play out. So a bit of an eye chart and I only have two of these trying to spare you some,
28:49
but I wanted to share with you that it was really very simple to set up. So the first thing we had to do is we had to from an elastic perspective, create a key and then we had to share that key with our S3 client. Well, our S3 client in this case is obviously the flash blade. And once we did that using a simple put command, we were able to create the S3
29:15
repository. You can see the capture here and it was done this next screen shot a little, a little more of an I chart. But the idea is we had to set up an ILM policy and I'm going to walk you through it from the top down. Essentially, we said with data that lives in
29:39
the hot tier. The action we want to take is to roll it over when it reaches one gigabyte. It truly is a big b not a small bee but one gigabyte in size and it's at least a minute old. And then that is played forward and we capture the information for the frozen tier that says, OK, my minimum age is one minute. I shouldn't have anybody anything in my
30:06
repository that's less than a minute old. And then we there's a couple other things there that we were able to do to put this in place. And ready to go. So the next logical step is to take a look at what the repository is doing, what is flash blade doing? So on the top, what you see here is the ingest of data we are writing to the flash blade
30:36
and remember we said that this was turning over every minute. So that's effectively what you see here. The rights are in orange, reeds are in blue, there's no read activity that you see in that first capture because we're writing. But then we wanted to take it a step further. And we said, OK, what is, what does this look like when we are performing a search?
31:02
And we wanted to do a search while we were ingesting data. And after we ingested data, and I said that rights are in orange, pardon me? Um And if you notice in the reads, we are up in megabytes again, not a large lot huge data set and we were turning it over pretty quickly, but we really weren't asking the flash blade to do that much.
31:27
And so when you look at the reads, all of a sudden, you barely see those rights. And, and really, you have to look at that, that left part of the axis and that simply is because in the ingest or right part of this, we're capturing it in megabytes per second and then on our reads, we're doing this in gigabytes per second. So you can kind of see that orange line is this behaving as we expected?
31:55
Absolutely, Chanda showed you this chart earlier and I wanted to revisit it because we wanted to make sure that if we were going to make a recommendation around data tiers that we could make a solid 11 that delivers to you ease of deployment. But we also wanted to improve on some things.
32:23
And I'm going to ask you to focus on the lower right here. When you look at that frozen tier and in the traditional elastic model response time minutes, not in our world, not in our world. And that's why I said earlier on that, you know, depending on the results that you can get from the frozen tier, it may give you the opportunity to store less in that hot tier and
32:52
more in the frozen tier. So I'm gonna share some test results with you. We wanted to know how we compared to, to other solutions solutions that I would describe as dis dis base cheap and deep archives, pure results in the orange, what we tested against in the green,
33:19
any surprises? No. And again, remember what we tested with. This is almost an entry level s 200 chanda earlier talked about the different platforms and the fact that we give you a choice. And I've been talking about searchable snapshots and we obviously have to leverage a flash blade there.
33:45
And you might be asking yourself, well, you tested with S and that's more performant than E because E is capacity based. But we, we fully expect that we would have positive results with the E as well. When you think about the deployment of elastic and again goes beyond just searchable snaps. There are many ways that you can deploy it.
34:10
We give you a choice and you can make a decision about what's going to work well for your organization. And you do that maybe based on your skill sets, et cetera, you can leverage flash array. And as you know, probably flash array scales from highest performance to more capacity based with reduced performance.
34:37
And flash array was designed to deliver lower latency, lowest latency flip side of that, I suppose is flash blade, flash blade designed to deliver maximum throughput and much like flash, you can make a performance choice or again if capacity is the direction that you believe is right for your organization, you could do that as well.
35:06
And then of course, everything is pulled together by port works, which is essentially an obstruction layer. Pardon me? Sorry, that allows you to manage and deploy and back up these environments rounding out the solution. We give you the like I said earlier, we give you the choice.
35:31
So when you can't go to the cloud, we focus on simplicity and efficiency. And what we wanted to convey to you today and get you to start thinking about is that based on your requirements based on where you are experiencing any pain, where you have challenges in the environment, we can help you as well.
36:07
I don't want to close out until I have an opportunity to talk about a successful customer. My, my first can drive. Sorry. Next, next is a voice over I PC company rather that's based in Scottsdale, Arizona. And you can see the broad range of services that they provide. They're captured on the picture on the right.
36:29
And they are, they consider themselves a substantial elastic user. One of their, one of the uh employees uh managing the it team was actually an active participant here at accelerate these last two days. And it's possible you were part of his presentation where he talked about all the different ways that they were leveraging pure storage in the environment and they would have
36:54
touched on elastic and NVA uses elastic a couple of different ways. They, it supports their logging, it supports their search. They have a separate instance that it provides a, a queue uh in, in support of one of their software as a service is offering around voice messages and they have deployed it all on flash blade, that was the decision they made for themselves.
37:24
And this customer has shared with us the fact that number one performance has improved significantly. Number two, they are excited about the flexibility that the solution brings to them and that they are looking to do more with flash blade and elastic in their environment. So the QR code is designed to give you quick access to additional information that is
37:57
available out on our website. But my, my call to action for this large group of people this afternoon. Yeah, that was a joke. OK. So is think about it. Where are the challenges in your environment? How can pure help you like? We've helped Nick.
38:19
So reach out to your sales teams, ask them to come in and talk with you about what's going on. We can even bring in specialists to help if additional expertise is needed. So I hope you have found this useful and I hope you will think about pure when it comes to your on prem deployments. I thank you very much.
  • Data Analytics
  • Elastic
  • Splunk
  • Pure//Accelerate

Log analytics tools, such as Elastic and Splunk, need storage that scales out seamlessly and balances performance and cost. Learn how a modern unified fast file and object storage platform helps Elasticsearch to stream log files securely and build apps that search across billions of files in real time.

11/2024
Enhance Data Lakehouse Infrastructure
Pure Storage® has partnered with Dremio, the unified data lakehouse platform, to help enterprises build a future-proof, scalable, and efficient data infrastructure.
Solution Brief
3 pages
Continue Watching
We hope you found this preview valuable. To continue watching this video please provide your information below.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.