Skip to Content
40:19 Webinar

Manage Modern Data Pipelines with Containerized Applications

Learn what modern data pipelines look like with a shift to disaggregated storage and the use of containers that expand compute agility and improve system utilization.
This webinar first aired on June 14, 2023
The first 5 minute(s) of our recorded Webinars are open; however, if you are enjoying them, we’ll ask for a little information to finish watching.
Click to View Transcript
00:01
You OK? Is it OK? Good morning, everyone. Thank you very much for coming to the session. So here we'll be looking into managing modern, uh, data pipeline with apps. If this is not your decision, uh, you have an opportunity. Um, so my name is Saraj. Rathnam. I'm a solution architect in portfolio solutions
00:28
within pure focused on analytics and databases. And I have Yep. I'm part of the technical marketing team, Uh, at pure storage, working for the cloud Native Business unit or the business unit. Thank you for joining us so quickly on the agenda. I'll go quickly. Go over.
00:44
What's analytics maturity. Um, how does it help you? Uh, I'll also talk about some of the challenges The legacy versus modern analytics. Um, and then we'll talk about why high Performance Analytics and the recipe for it, and then will go over data services and coon days. Uh, some consideration some challenges and how port works addresses them and then automation
01:04
using port works. And he'll also show a couple of demos, uh, based on the timing. And then it'll end up with the customer use cases with that. So data tics. I think you all you. You would have heard this quote multiple times. Um, this is a 16 year old data is a new oil,
01:22
um, data as like oil is valuable. Um, but it's of no use until it's actually refined. If oil is not refined to petroleum, it's of no use. Same thing data. If it's not turned on information, it's of no use. So, as you know, in 2011, Peter.
01:38
So from Garner, he actually came up with a phrase saying Information is the oil of the 21st century and analytics a combustion engine. So data by in its raw form, it's of no use unless you actually turn some information. But to make meaningful and actionable insights, you need to move that information into analytics and analytics. Play a big role.
01:57
Anyway, I'm preaching to the card here. I think everybody knows the value of data analytics these days. But what you want to touch upon is the maturity of analytics. Um, as a matter of fact, uh, ESG Enterprise, um, strategy group actually did a study a couple of years back, Um, in terms of measuring the analytics maturity and the impact it has on the companies.
02:16
Um, so what they measured was basically three factors. One is the richness in data in terms of how many disparate data sources the company is pulling data from the second is in terms of investment in, uh, analytics. Either a percentage of the IT budget, uh, towards, uh, infrastructure or software or adoption of AM L Technologies.
02:36
And the third was the focus on analytics. Um, they measured in terms of like whether analytics is being one of the top five priorities, if not number one. So what they did, they actually went through, uh, multiple companies. Based on those measures, they placed them into three different stages stage one being the low level of maturity and stage three being the higher level of maturity.
02:57
And what they noticed was stage three organisations had a better outcome in terms of the business. Uh uh, they had a positive and better, uh, business outcomes. Uh, so there's a correlation between the level of maturity and analytics and the outcome of the business in the business. Um, so as far as some of those, uh, results that you see here from the paper,
03:18
the Stage three organisation, which is like high level of maturity can outpace their competitor by 2.5 X in when it comes to customer satisfaction. Similarly, those Stage three organisations actually have 3 50% of revenue per employee within the last two years. And they went ahead and did 46% more products innovated within the last two years. So this basically shows the value of the
03:44
maturity with analytics. And when you actually invest on those, you did actually get, like, a competitive wedge over your peers as well. So keeping that in mind, we also want to go back and look into some of the challenges. Let me actually play this out. Uh, some of the challenges with the what is called as a legacy analytics architecture. So you saw that,
04:06
uh, code, which is 16, 17 years old. Most of those analytics, um, architecture, were actually legacy is also 15 years old. So what is called as they are built what is called as a distributed scaled model? So in that particular model, every server has storage in it. So the data is actually along with the compute, uh, and then those are predefined set of
04:27
servers, So every one of them might have like, say, 10 terabytes or 20 terabytes of storage. So the data is actually spread across, uh, on the servers. And when you run out of space, you add a server. Uh, because this is actually, uh, you have a local attached storage at the time. That's how the setup was in the distributed scale model.
04:44
So if you can think about it, if you have a data of, say, 100 terabytes and you have 20 terabytes in every server is what your configuration is, Generally you would need five servers to host it. But from a available standpoint, if a server goes down, then for the data to be available it it needs to be in some of the servers.
05:02
So what they did is they started replicating the data. So you see, this is a replica, a replication factor in the applications. Um, so because of that, you started actually multiplying the data. So you have. It's very common to see replicate factor of three. So the 100 terabyte become like 300 terabytes.
05:16
So that means now you need 15 servers, so everything actually goes up from a compute and storage. So you need more servers to support whether, uh, you need the compute requirement or not, but for storage purposes, you need more servers. So this created other challenges in terms of, um, uh, data migration. So what it means is when you run out of space
05:35
so you added 300 terabytes. Now you run out of space. You wanted to add 100 terabytes more so you add five more servers, but it's of no use until you move the data from the existing servers to the new servers. So this is called us like data rebalance. So this is more a time consuming process, physical movement of data which is actually, uh, overwhelming on your infrastructure as well.
05:55
So these are some of the common challenges you generally run into with this legacy architecture. Uh, and there are other ones, which is like availability. I expect if a server goes down, there are applications like Splunk. If you take, they have this replication factor. They need to meet the replication factor.
06:09
So if you set the replicant fact to three, if one of the server goes down, so the replicant factor actually goes down to two. So to get it back to three. They have to actually recreate the data in some other server surviving servers. So these are again movement of data and takes a lot of time. It takes your cycles out of supporting your
06:26
business application rather to do all this data, uh, movement. And the last one is, uh, which you would already be aware of because of this data analytic application, which are more hardness. Um, you It's very common to hear terms like heart warm and cold. Um, so hard and warm. Generally be in like, say,
06:45
ss D or NVV drives. And the cold tier would be like in drives. Um, so it depends on what kind of searches or queries that the users are doing. Either you were hitting the hot tier or the the cold tier, So if it's in the cold tier, the performance is going to be different based on, uh, the media performance. Right?
07:03
So these are some of the challenges. And these days, with the amount of data that's coming in this legacy, architecture is not going to be sustainable at all, Right? So, as a matter of fact, um, so what we call a modern analytics platform. Um, this is either bon or reborn in cloud. Um, the reason is there are application vendors,
07:24
like, say, elastics plank, and pretty much most of them. They wanted to offer their services in the cloud as well. So that means they initially set up the same architecture in cloud. And they quickly found out it was very expensive because of the, uh, storage, uh, usage as well as the compute requirement, right.
07:42
It become very cost prohibitive for them to actually run a service over there. So what they ended up doing, went back and actually tried to rearchitect. So the fundamental changes everybody did is take the data storage out of that compute and move to a centralised storage. So this is where basically, all these, um, vendors that you see here have done um,
08:04
this is something we've been also preaching earlier. Sort of using local data storage you can use. Before all this history came in, we were suggesting about using centralised storage as well. But going back to this particular scenario here, what they did, they ached the code and then using a W three are on from three storages like pure storage
08:23
flash blade in this case. So the benefit is we at this point Now you can scale independently your compute and storage purely based on that particular need instead of just adding servers just to meet the storage requirements anymore. Right? So those are some of the changes that actually brings in that help them to actually scale
08:42
independently. And this basically goes into what is called a cloud based architecture purely from a scalability perspective, Not necessarily cloud native, but more cloud based architecture. And this reduce your storage requirements. Since now the data is in a centralised storage, you don't need three replicas.
08:55
You have only one copy of the data. So your 100 terabytes goes back to again. 100 terabytes in a centralised storage rather than 300 terabytes across multiple servers. Right? Um, and the availability. Now it's offloaded to the centralised storage. Like AWS. Three are on from three storages.
09:11
Uh, so they take care of the availability and the application need not manage availability by creating multiple copies, uh, across the tiers. Right. Further driving down on that particular one with respect to the centralised storage using yesterday protocols. Um, some of the other benefit that you generally see is, um um, for applica for storage.
09:34
Like pure storage flash data on top of reducing the storage to keeping only one copy. We also do, uh, uh, in line data product through compression and also encryption. So you actually can be able to put more data, Uh, in this in a storage? Um, if not, actually not for the compression point. Right? Also, if you have, uh, encryption address
09:56
requirements, um, storage the flash beard already offers you in online Not necessarily everyone. Every everyone that might have it, But this is something it's already in built. The biggest change that this happened this enabled us to free up the compute, uh, timing resources earlier. The applications are managing,
10:16
um, the migration of data. So they were spending a lot of cycles and data services in terms of moving the data, But now it's actually offloaded, and so they don't need to do any of those. And they can actually use the, uh, compute um towards actual functionality for the application. And this because you are actually de segregated. Now you can go back to,
10:36
uh, reducing the number of servers you don't need that many servers anymore, because all you need is only based on your requirements from, uh, uh, user behaviour and not necessarily based on the storage capacity anymore. But that's actually only in terms of, um, going into a centralised storage, right? Um, using.
10:56
But what we are seeing now with I think you heard everywhere in terms of data is growing. There's, like, huge amount of data everywhere. Um, and everybody wants the real insights, real time insights. So what this means is now we are seeing, uh, when you actually take the data into S3. And if you wanted to pull data back for any searches, then it needs to be as quickly as
11:16
possible. So what we are seeing is the recipe of more of high performance S3 storage is becoming a a requirement these days. Um And so in case if you take in terms of slunk as an example, sluka is something called a smart store that they have cash and you want. Ideally, they want all the searches to be within the cash.
11:34
But if it's not the cash, you have to go pull the data. If you're running historical searches, you have to pull the data from back to the cash, and as quickly as you can do it, you get the performance for your searches as well. Right? So we are talking about gigabytes of traffic these days, either in just input or output, and that's becoming a standard.
11:54
And to support that what you're seeing is high performance network on the client side. I know, uh, flash be offers, like 4 40 gig connectivity. You can do 100 gig connectivity. Uh, but in terms of from a client side G of those days where it used to be one GBI uh, interface. Now we have 10 gigabyte.
12:11
Um, it's a standard, but we are seeing 25 to 40 gig on every server interface cards as well. Right? So you'll see this is becoming a standard. But what we're also seeing the next one is, uh, the important one, which, actually, um uh is what we're talking about. Here it is increasing the application threads either through containerization or through
12:30
virtualization. So we'll see further on this one. Um, the benefits of container. So if you take a bare metal servers when you run your application, so you run one instance of an application on that particular, uh, server. So if in this particular case you have four
12:45
servers running, um, say you're running on an analytic application, so each instance will run one in each server will run one instance of the application. Um, and it has some capabilities to do. So each one has its capability. You can actually beef up the server, but it's up to the application of it is designed to take advantage of it, right? If it is not able to take the threads and codes
13:06
available to it, then beefing up the server is not going to help you. So we have seen this with, at least, like, two or three different applications. Clearly, um, there's only a certain limit to this, Even though we actually try to beef up the server, we can't get uh, um improvement of, uh, this either the inju performance or the search performance benefit. We don't see it,
13:25
but when this actually is containerized, um, for the same set of, say, four servers now, so you containerize it and then you run multiple instances through containers. So in this particular case, if you're running like four X instead of four inches of obligation. Now we run 16 inches of application, but on the same exact four servers,
13:43
it actually opens up a lot more. Um, So what happens is, uh, you get you started using the utilisation of this, uh, server, uh, heavily as well as also from a network and aspect of it. There's a lot more, um, parallel connections to the storage. And, uh, if you heard about like what flash blade has to offer,
14:03
the flash thrives on parallelization. If you have a lot of parallel searchs coming into flash bed from a standpoint, you get a lot of, uh, uh, performance benefits in terms of, uh, your throughput. Um, so what we've seen we actually did an internal, um, test as well as we also did this, uh, with the, um, intel,
14:21
um, as a POC, um, taking their one of their use cases and then moving to quantization. What we are seeing is two X to three X improvements. Um, so from standpoint, if we did a test internally, which is on a four server, we're able to do, say, 200 megabytes per second of ingest when you actually move to containers on the same thing. With the four servers, we are able to go up to
14:42
650 megabytes. It's almost three X improvements just by container where we didn't do anything else. Uh, just by doing that and same thing with respect to searches, we did see actually, like, uh, almost two X improvement in number of concurrent searches that we are able to run compared to a badmin server versus UN containers. So we do see a lot of improvements when you
15:03
actually, um, do multiple instances of application running on, um, on the same exact server, but through containers. Uh, but then this brings other challenges which will actually talk about from a container standpoint, uh, but quickly want to go through some of the architecture that you can see from modern analytics. Um, that is the 11 of the POC that we we did
15:23
with actually Intel on the right side, which is using, like, say, brokers, uh, to pull the data. And then the brokers were actually send the data to Splunk, and all of them are actually access three storage on pure flash plate. Um, C FA has, uh tier storage tier storage. And then Splunk is using what is called a smart store. Um, they are using, and then, um and you can
15:46
actually use port works as this. Um, first, um, state Full containers. Right. Um, so this is kind of like an architecture that, uh, can help you move towards a high performance analytics with this. Now I'll hand over to bovin.
16:00
Thank you. So Ok, so before we get started with this right and talk about the benefits of running it in containers and orchestrating everything through how many people here are using human in some form or fashion? OK, nice. How many of you know port? What port works is Oh,
16:23
OK, Nice. OK. Half the room knows all about photos already. So that's a good sign, I. I don't have to talk about just one on one level. Things. Uh so, as mentioned right, the you you do get benefits when you break it, break down your application from running it on bare metal notes to containerizing it so you can get more of those connection end points.
16:41
But then, uh, I'm not sure how many of you attended the keynote today, so said JC. I, which was one of the customers that we highlighted, is running 3000 custom uh, containers 60,000 containers. OK, thank you. So at that scale, you need an orchestration system like Cubin to help you orchestrate your containers.
16:57
So cub is kind of the de facto standard when it comes to running containers. So from a daisy, your perspective, it will help you set the desired state and run your containers on bare metal nodes, virtual machines or cloud instances, or, if you want to take it a step further, uh, like help you with non disruptive rolling upgrades. Help you with scale out scale up operations as
17:17
well. So definitely has benefits. And I know we like to call like communities is the next best thing since sliced bread. But there are some considerations or challenges that you have to keep in mind when you're dealing with containers or communities. So that's the goal with this section, right? Like figure out or talk about what those
17:32
considerations are, and then see how works can help you. So if we start with, I think I have 10 of these slides where we talk about different considerations. Uh, first being use communities operators. So what are communities operators? Right operators are software SREs or codified SREs that help you deploy and manage an application.
17:51
So instead of you coming up with, like, 1000 line L file to deploy a specific application on top of communities, communities, operators are something that know how an application should be deployed on top of any cluster. So if you are going down the communities route, make sure that you are using, uh, an operator to deploy and manage your applications.
18:09
You're not just using hand charts or individual containers or L files to get started with this journey. Next up, if you talk about high availability, I know so mentioned about how, uh, for just for H a applications build that replica factor of three in the and your data exploded from 100 terabytes to 300 terabytes. Uh, to add to this right, like you need to make sure that even when you containerize,
18:33
you have to account for high availability. Uh, this can come from the application of the database layer where you are running like a replication inside the database tier itself, storing three replicas But what if a specific node goes down? You still have to worry about all of the storage traffic being re synchronised to the new node that eventually comes up.
18:53
That's where having replication enabled at the storage layer will also help you. So let's say instead of doing replica factor of three at the database tier, you do a replica factor of two. Maybe you bump up the replica factor at the storage layer. Make that too. So if a node goes down and community it on a different node in the cluster,
19:10
all your data is already there. Instead of syncing terabytes of data, you might just have to do the delta the time it took for the communities to redeploy the port. So make sure that you keep high availability in mind. Uh, both from an application perspective and a storage perspective talking about resource requests and limits. Uh, it is really important to set these
19:29
communities, doesn't force people to set these by default, so you can have applications that are running on communities that can just run wild and consume more resources than they are supposed to. You don't want a noisy neighbour issue where UH, just two of those application or database parts are consuming all the available CPU memory and storage resources.
19:47
I want to make sure that you are setting these requests and limits in place the minimums and maximums that allow you to control how much how many resources your application are actually consuming. Next up. Anti affinity. So let's say you're running a, uh, three containers, right for a specific application or a specific database if you're
20:05
running all three of those on a single bare metal node or a single community worker node on a virtual machine. And if that node goes down, you lose your entire database. That's obviously not efficient. The reason I brought bring this up like this is not, uh, like a new thing to anybody. But the way communities orchestrates containers,
20:23
it doesn't take into account, uh, anti affinity rules or volume placement rules. It just looks for available resources. So if a node has resources available, it will go ahead and schedule those parts or containers on that node. So make sure that whenever you are deciding to run things on communities, use anti affinity rules, and we'll talk about how port works can help with this,
20:42
but use something that spreads your application around different nodes in your cluster. So you, uh you are not prone to, uh, bringing your application down? Uh, if a node fails talking about database tuning, uh, I just wanted again. There's nothing Port Works does here to help, but just wanted to talk about like, just because you moved to communities just because
21:05
you move to containers, there are still some operations that you will have to do, like, uh, making sure that your indexes are configured properly, configuring database parameters that you do today. All of that doesn't change. You still have to do all of those things. It it it will be better,
21:18
like you'll get a better performance from it because of what so explained earlier. But just make sure that you are still following those best practises from a database layer scale up and scale out. Uh, this is about understanding how the application or the database is actually built. There are databases like Sandra, for example, which can scale out. You can keep adding more nodes to the cassander
21:40
ring, and you will get a better performance from a read and right perspective. But then there are databases like Post Right where by adding new nodes in the cluster, you're just adding read replicas. You're not in in in improving your right performance, so understand how your database works and then perform, like add more nodes or add. Add more resources to that primary node instead
21:59
of adding more notes to the cluster for disruption budgets. Again, this is very community specific. So, uh, one of the benefits that communities brings to the table is non disruptively, uh, applying upgrades to any application that's deployed on top. It does have a parameter that you can set called part disruption budget.
22:18
So it takes down one node at a time, for example, and non disruptively goes through your, uh, highly available cluster and upgrades that to the latest version. But if you don't configure this or if you don't know what this is, you might end up in a situation where you're taking where itself is taking down more than one node or taking down multiple nodes just to apply, uh,
22:37
a new version upgrades to that database layer, so make sure you set this and not forget about part disruption budget talking about security? Uh, again. Another thing, right? Like security should be everybody's responsibility to make sure that you are, uh, blocking a public access to any database you are similar to. How we heard stories about V centre endpoints
22:57
being available from the outside world that there have been scenarios. Uh, API server, which is that end point, uh, similar to the V Centre Endpoint people have exposed it to the public, uh, public internet in the past as well, So make sure that you are restricting that you're keeping it private. And then inside the cluster itself, you're using things like role based access control.
23:18
So only specific users or groups have access to specific resources or specific name spaces on your communities cluster. So security is still a concern and then comes along encryption right. So you can run like cities, is treated as the OS for the cloud. So you can run cities as that orchestration layer on bare metal nodes on virtual machines or in public cloud through managed services
23:40
like Amazon EKS, Google Cloud, GK, Microsoft, azure, a KS. But from a feature perspective, right doesn't do security. Like if you just do a quick Google search around and encryption. The only thing that you will see is it allows you to encrypt your secrets, which is where you store your user credentials. It doesn't really have the capability to
24:00
encrypt data at rest. So make sure that whenever you're thinking about using companies use a storage solution, it can again. It doesn't have to be port works, But make sure that you are using encryption address to make sure your data is encrypted. Backups, uh, as a A. Again, I think two points everybody in this
24:19
room might already know, like clustering doesn't eliminate the need for backups. And snapshots are not just not backups. You need an actual backup, Uh, especially when it comes to communities. You do have your persistent volumes, and you can snapshot those. But then communities uses additional constructs like something called as pods.
24:35
Service objects, CONFIG, maps, secrets. So if you're using or deploying your databases using all of these communities of constructs, you have to make sure that you are protecting the application from an end to end perspective. It can just be the case where you are protecting your PV CS or persistent volumes and leaving everything else in a yaml file to maybe restore at a later point.
24:54
You need backups, and you need a community native data protection tool that can help you protect your entire database. Instance, Not just snapshots. So those were some of the considerations, right? Things to keep in mind. Hopefully, people remember those, uh, but the first thing that we spoke about was Cuban operators and operators being that, uh, the best way to deploy things on top of.
25:17
But there is a challenge. Even with operators. If you go to operator hub dot IO, there are so many different operators listed for each database or data service that you might want to use. Just looking at the screenshot here. If you are looking at posters, there are like eight or 10 different operators that are available for you.
25:32
Mogo DB has three or four operators. Uh, same with my sequel. Every vendor has Every database vendor has kind of their own operator that they allow you to deploy their database instance. And for an administrator or for a team that's just beginning their community journey, it gets, uh uh, It becomes a really heavy lift when it comes to evaluating all of these
25:52
different operators, figuring out what features each of these have and figuring out which is the best one for them. And it just multiplies the complexity when you are dealing with more than one or more than 10 databases or data services in your organisation. So, for example, Moga DB, the open source operator, doesn't have encryption address. But if you pay for their enterprise operator, they will give you encryption address.
26:13
So that's that's one example of things that you need to check before you make a decision, and all of these are open source. So there might also be a situation where you might have to commit code back to the open source repository to get the feature that you want. So choose wisely. That's what I'm just sharing with the team here.
26:30
So that's like the non Port works pitch. I just want to make sure that everybody who's going through this journey is, uh, has some of these things in mind. What we want to do in this next section is talk about how port works can help solve some of these things, like take it off your plate and handle it from a platform perspective. We'll look at two of our products. We will look at port enterprise and how it
26:51
gives you that easy button. Uh, it takes care of things at the storage layer, and then we'll also deeper do a deeper dive into ports Data services, which is that database platform as a service offering from Port by PR Storage, which is a hosted control plan that helps you automate Day Zero and data operations for your databases. So let's talk about Port Enterprise first,
27:12
right? So Port Enterprise has been a leader in the market for more than three years, like it is recognised as the number one, community storage and data platform, uh, for running a state full applications. The way Port Works. Enterprise runs it. It is a software defined or a cloud native storage layer so it can run anywhere your
27:31
clusters can. So if you're running it on bare metal notes with physical drives, Port Works Enterprise can run on those bare metal notes. It can aggregate all of those physical drives and give you a unified storage pool that you can use to provision, block and file persistent volumes. this storage tool brings in additional capabilities.
27:48
Right we bring. We give you high availability and replication inside the storage cluster itself. So let's say you're deploying a database like Cassandra, and now you want to enable replication of the storage layer instead of setting that parameter at this entire storage pool. Because you might have, like a test, Dev instance.
28:04
You might have your staging instance, and you might only need a replication for your production instance, you can configure different levels of replications with you by using different storage classes. And storage classes are just the way communities allows administrators to offer class of service different classes of service. So if you're using, if you're familiar with VM Ware Technologies,
28:23
it sounds similar to VM Ware, S, PB, M, right storage, policy based management, where you're defining different policies, and then whenever, uh, VM get deployed and have those V MD K running, they automatically inherit those parameters. Same thing happens inside cities if you have, if you are that you can define a storage class that has a replication factor of 12 or three,
28:42
and then whenever an application requests for a volume, it automatically creates a primary copy and two replicas or one replica on your storage pool. We spoke about how you do this on bare metal, but the same functionality can be like works on virtual machines as well or in in the public cloud. If you're using something like Amazon EKS with EBS pack storage,
29:03
all Port works needs is block storage on the back end, and we aggregate all of that into one unified storage pool, talking about encryption and role based access control. We definitely provide those capabilities. So you use something like, uh, we use the JWT Web tokens to enforce that the role based access control,
29:20
where you can have name space, specific permissions for users. And you can define whether the user has view permissions or edit permissions when it comes to that specific name space so you can break it down and have our back in your clusters. Uh, we do encryption as well, so you can bring your own keys and bring your own K MS providers, and either have one key to encrypt all the persistent volumes on your storage cluster.
29:42
Or you can even have individual private keys to encrypt individual persistent volumes. So if you are running a multi tenant cluster with different users or different tenants on it, we can, uh, allow you like again. It's a feature that you can use with K MS, where each volume can be encrypted with a different key. So even if User A wants to mount the volume that user B has,
30:03
they actually can't read the data without the key talking about unified block file and object storage. So, uh, one thing to highlight here is unified block and file. You get that functionality from the port Works software based storage layer as well. But if your developers are familiar with the community's constructs, and they would rather use communities to,
30:23
uh, deploy object storage buckets on the back end for their applications instead of having to use S3 API S or flash played API S. Uh, Port works allows you to use comin constructs to create these backend object storage buckets mounted to your application so you can start storing data. So if you remember the diagram that SOMO showed where all the containers are mounting a
30:40
specific S3 object, you can control all of that from inside communities with port works instead of having to go to two different management portals or two different set of API S, uh, talking about snapshots. I know snapshots are not backups, but snapshots are something that's that is available with port as well. You can configure it as part of that storage class definition.
30:58
So let's say if you are the admin you don't have to worry about. Oh, I have 30,000 containers. Maybe 10,000 of those have volumes in it. I don't have to worry about manually going in creating snapshot policies for each of those 10,000 volumes. I can just set a policy in my storage class definition, and it automatically whenever that volume gets created.
31:14
It inherits that snapshot, uh, policy. It can be a local policy where it's storing the snapshot on the same cluster. Or it can actually offload your snapshot to an S3 compatible, uh, object storage bucket as well on prem or in the public cloud. So you can have local snapshots and cloud snapshots and then automated capacity
31:32
management and volume placement strategies, automated capacity management, right. If you're running in the cloud or if you're even running on Prem, you don't want to over provision your storage. You want to start small and only add more storage when your application or database actually needs it. Port has a feature called Port Autopilot,
31:48
which is a rule based engine where you can specify a rule in terms of if this and that. So if it crosses a specific threshold, if it crosses the 60% utilisation or 70% utilisation mark, add 50% more capacity to my persistent volume and keep doing this until you hit a maximum storage of, let's say, 100 terabytes. I'm just throwing a number out there, but you can set this rule and forget about manually administer administering individual
32:11
persistent volumes. Port Works monitors the consumption on each of these volumes and automatically performs these expansion operations. And then from a volume, uh, placement strategy. This is how we work and provide, uh, affinity rules for your, uh, database parts right? And the volumes underneath it with port works, you can create this affinity and anti affinity
32:30
rules where you can not just spread out the volume and its replicas across different nodes. So if a node goes down, you have a replica ready to go on another node. But you if you also wanted to have let's say you are deploying a Cassandra Cluster as a state Full set object. And you, uh, there are five replicas. You want those five pods with their persistent
32:49
volumes running on different notes? This is something that you can enforce with port works volume placement strategies as well. So this is the easy button. Like you still control the storage layer. You can fully customise however you want, uh, from a port works enterprise perspective, the community storage layer perspective.
33:05
But then, if you wanted to take it a step further, right, uh, deliver that self service experience for your end users or for your developers, and just have a managed service. Kind of an experience like you still own all of the data. You still bring your own communities clusters. Whenever you use this portal either to the U I or the rest API uh,
33:23
and you deploy databases on communities. It still runs on your communities. Cluster. This is just that. OK, sorry, I. I can repeat that. Hopefully Siri gets that do, uh but, uh, we don't host any of the, uh, customer data. This is a control plan that can be accessed
33:40
through the Web like it's a hosted control plan from us. Uh, you log in, create organisations start deploying this from a curated set of database images that we have. We support 12 different databases or data services today, including we have screenshots that we'll talk about which data services we support, but including things like Poster, Sequel and my sequel Moga B Enterprise SQL
33:58
Server Kafka, Zookeeper Cassandra I don't know if I can cover all 12 right now. Uh, but that's port work data services. So if you dive into port works data services a bit more further, right, if you are the admin, let's say if you are the admin persona, uh, we'll talk about what responsibilities you have and what things you can customise even with a managed service. And then, uh, we'll, uh,
34:19
in the next slide, we'll take up a a developer persona and see how easy it is for the developer to deploy these databases. So, from an admin perspective, uh, once you create an organisation inside the Port Data Services Control plan, you can add more users you can bring in your, uh, you can add your teammates as other admins in the organisation.
34:39
You can add more developers as individual users that can deploy databases. Once you have added your users, you can actually bring in your communities clusters. So from a community cluster perspective, we support everything from Amazon EKS, azure S, Google Cloud GKE to VMware tan that had openshift even open source communities. All we need to do, like all you need to do to add these is copy a simple helm command which
35:02
deploys an agent on your communities cluster and connects it back using a secure reverse tunnel back to the control plane. So it goes through 443 just one connection and all the, uh, scheduling decisions that we push down on the cluster goes through that tunnel and gets executed or gets implemented on your community cluster. So once you have added your clusters next up, you can add your backup targets and schedule.
35:24
So as an admin, you can configure where you want your developers to store backups you can add, uh, if you are like a multi region or a multi geography uh, company. You want backup targets in US and in Europe for GDPR, you can configure those backup targets. Name it properly so that when developers enable backups, they can select the right target. You can also create your own schedule policies
35:45
to ensure that you meet your SL A requirements. That's something that the admin can fully control. Talking about storage options. Uh, the replication factors that we discussed the replication factor of 12 or three can be completely customised by the admin. So if you want your developers to just use replication factor of one for their Dev databases or for their development work,
36:04
you can have that as an option. And then, if you are, if they are deploying something for production, they can select a replication factor of three and then automatically. All of those changes are implemented at the layer. Uh, we also allow you to select XFS or EXT four as the file system. We also allow you to whether you want to
36:20
enforce the volume placement strategies and spread out your database ports across different nodes, or you just want to do best afford spread. So those are the three things you can customise. But in addition to these things that we allow you to to customise. We already know how to run these databases on communities, right? So we'll implement certain best practises under
36:36
the covers without, uh, without highlighting all of that complexity to the administrator. We can take care of certain things and make sure things work on under the covers and then one last thing. Application configuration templates and resource setting templates. So resource setting templates is that t-shirt sizes that gives you the ability to define your CPU and memory minimum and maximum limits.
36:58
Same with storage. You can define how much storage you want, so this can be a template template. Whenever the developer deploys a new database or data service, they can select from a list of different templates that is available to them. And that's the policies or or resource limits that will enforce on the communities layer. So again, since this is something that's missing from communities,
37:15
we decided to add it in the product itself. And then finally, application configuration templates. Uh, since you still have to, uh, tune your databases or configure specific database configuration parameters, you can define this as an in a template itself, and you can either set the key value pairs when you're creating the template or you can set the key and the developer can fill in the value
37:34
when they are deploying their database. Instance So you can completely customise how the database is actually deployed as well. Uh, from a developer perspective, it is really easy. Like you access the user interface or you access the rest API S using an API to And you, uh, deploy one of the 12 databases that we support today on any communities cluster.
37:54
Uh, at the end of the deployment, we just give you a connection string with a user name and password to start accessing your database. So it's as easy You don't even have to go and configure anything on the communities cluster. If you are the developer. All of that experience, Um, now all of that is transparent to you. You just use the U I and a P API for P DS. And, uh,
38:12
everything is deployed for you. If you want to enable backups, you can do that on day zero as part of the deployment Workflow. If you don't want to deploy, uh, enable backups on day zero, you can always come back to where the developer can always come back and enable backups on a self service basis and then finally monitor databases from the P DS or product data
38:30
services U user interface. We have a tab where you can monitor not just your storage performance. So this Diop through put latency that we all love but also database specific parameters, like if you're using post stress like transactions per second, tuples and and all of that can be monitored from the P DS U I itself. So this is what user experience looks like. Let's look at two more slides and then we can
38:51
jump into demos. Uh, talking about day zero. This is what the U I looks like. Like that's the simple form that you have to fill out. You select the version in this case, we're deploying Kafka. You select the version of Kafka that you want to deploy. You enter a name for your Kafka cluster,
39:05
you select the target communities cluster where you want to deploy this Kafka cluster. Uh, you select that application configuration template that we discussed in the previous slide. You select the resource setting templates, so the size of your actual instance, Uh, and then you select the number of nodes the number of brokers you want in your Kafka cluster you can optionally enable,
39:22
uh, backups as well. Uh, and select where you want to store that. And how frequently do you want those backups to be taken? So this is what a day zero deployment looks like for, uh, from a P DS perspective. And then from a day two perspective, we have scale up and scale out operations.
39:37
Uh, if you if you want to add more nodes to it, you can just go to the U. I increase the number of nodes and P DS will deploy those nodes on your community cluster if you want to change your resource setting template. So maybe you started from a small If you want a large one, you can just change that in the drop down menu. We will non disruptively upgrade your database
39:54
cluster for you. A data protection. We spoke about scheduled backups and ad hoc backups already and monitoring it's inside the U. I in place upgrades. Let's say you started with 40 post version 14.4. You are You want to upgrade to 14.5 or 14.6 You can just select that new version from the P DS U I,
40:10
and we'll non disruptively upgrade your, uh, post stress cluster for you.
  • Data Analytics
  • Pure//Accelerate

Enterprises rely on modern analytics platforms to optimize decision making processes, provide observability, protect intellectual property, and drive sales and customer engagements. As such, mature data and analytics strategies are key to building competitive advantage. In this session, learn what modern data pipelines look like with a shift to disaggregated storage and the use of containers that expand compute agility and improve system utilization. We will address scale challenges with Splunk and Apache Kafka by using Kubernetes and enhancing storage agility with Portworx® solutions. Lastly, we will share the benefits of containerizing applications to drive performance and simpler software management.

Free Trial of Portworx

No hardware, no setup, no cost—no problem. Try the leading Kubernetes Storage and Data Protection platform according to GigaOm Research.

Sign Up Now
07/2024
Pure Storage FlashArray//X | Data Sheet
FlashArray//X provides unified block and file storage with enterprise performance, reliability, and availability to power your critical business services.
Data Sheet
5 pages
Continue Watching
We hope you found this preview valuable. To continue watching this video please provide your information below.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.