Skip to Content
8:35 Video

Machine Learning Pipelines Using Kubeflow and Portworx

In this video, we will learn how organizations can build Machine Learning pipelines on Kubernetes using Kubeflow and Portworx to run experimentation and hyperparameter optimization.
Click to View Transcript
00:00
Hello. My name is vain Shah and I'm a senior technical marketing manager at Port Works by PR Storage. In this light board video, we are going to talk about how data scientists and a I developers can use open source tools like cube flow to build machine learning pipelines and perform experimentation and hyper parameter tuning for the models that they are building on top of coon.
00:29
So, uh, we'll have a couple of personas in this, uh, video for the discussion in this video, we'll have the platform team that will be responsible for building our community, cluster the infrastructure underneath it and also configuring an object storage repository like a pure flash blade to store all of our raw data. So we have already configured flash blade. We have stored our raw data.
00:54
We have our communities cluster with nodes and port works already installed in it. And in this video, we're just going to talk about how data scientists can use the tools that they are already familiar with to build these machine learning pipelines. So, uh, if you if you take a quick poll around it, uh uh, if you take a quick poll and ask data scientists What's their favourite tool of
01:20
choice? Most of them will say Jupiter Labs or Jupiter notebooks. But one common issue they have is finding the right amount of resources that they need to run their Jupiter notebooks because their notebooks can be used for data curation model training, model validation, et cetera. One way to solve this inside your on trend data
01:41
centre or in inside your cloud accounts is by deploying a cluster and deploying an open source tool called cube flow. The whole purpose of Q flow is to make sure ML pipelines are scalable and are available to data scientists on a self service basis. So as a data scientist, once my platform team has deployed the CIN cluster and installed Q
02:03
flow for me, I can just access the cu flow ui again that doesn't need any additional skill set and request for a Jupiter notebooks in the request form. I can specify how many compute resources I need. This can include things like CPU memory, even GP U resources and how much storage I need for that specific stage of the pipeline. So as part of the Jupiter notebook, I can define a pipeline which includes things like my
02:27
did a curation phase, my model training and then I can run a model test or validation phase as well. And whenever I will deploy a Jupiter notebook inside using the Q flow UI. Since it's running on Qin, it automatically deploys a community pod, which is backed by a persistent volume claim, which is backed by port works.
02:54
So the data scientist doesn't have to open up tickets to get access to resources. All of this process is automated by because of the way Q flow communities and port works work together. Data scientists can choose to deploy new, persistent volumes or new data volumes to store their curated data sets or to store their model checkpoints or use a scratch space. Or they can also use existing persistent
03:20
volumes that are configured in a red write many, uh, access mode from Port Works so that they can use the the curated data set that a different data scientist from the same team has already done so. Somebody has done the work already. I would just want to use that as a starting point instead of starting from scratch by using RTE volume, Uh, many volumes from port work. It allows me to do that.
03:41
Next, Using Jupiter notebooks, I can create an entire machine learning pipeline using python code. And as part of that python code, I can use S3 a P to get my raw data from my flash blade bucket so that can be step one. But then, uh, for any of the future steps like the actually curating that data or creating my data set from all of the raw data that I just imported to training my model and storing
04:09
those checkpoints to validating my model against the test data set that I have. All of this can run inside communities so we can use data on communities to store information that I need for the rest of the pipeline. So if I'm running this on AWS, which is what a lot of organisations are doing right now, I don't have to go back to my S3 bucket and pay for ingress or egress costs.
04:31
Whenever I want to import data export data in, uh into my community cluster where my actual training jobs are actually are running or my machine learning pipelines are running. So once I. I have a pipeline. I can use the QU I to create a pipeline object and that that pipeline object can be shared across my my team of data scientists or engineers as well to perform or to run
04:55
experiments. Q flow allows us to just, uh, tune the hyper parameters to what I want for that specific test. And it allows me to run an experiment. And at the end, I can monitor what the output of the model accuracy looks like. I can monitor the logs for each step in this pipeline process by accessing,
05:13
uh, just from the Q flow U I, uh So let's see what this, uh, pipeline workflow actually looks like, right? As soon as I trigger a pipeline run, uh, each phase in this pipeline will correspond to a community pod. So I'll have a pod for the data curation phase, which is bringing in or importing the data.
05:30
The raw data set from my pure flash plate bucket or any S3 bucket that I'm storing my raw data in. And then I'm storing it in a red right, many persistent volume. So once that's done, a second pod once my data curation phase is done, I move on to the model training phase, which is where a second pod gets spun up.
05:50
And this part, instead of having to import the data again, the curated data set again from my S3 bucket can just point to the rewrite men volume that's already that already exists on my C taste cluster. Once that's done, I can either use the same volume. Or I can spin up a new red write menu volume to store my model checkpoints.
06:11
Or this can be a reri at once as well. And then finally, once my model training is done, I can run my validation. And this again corresponds to a community sport which can access the same, uh, volume or it create. It can create a new volume. All of these phases can can, uh, since they are running on companies don't need specific
06:31
infrastructure resources to be provisioned beforehand before any experiments can be done. Because of the all, uh, because of all the automation that's built into these projects, open source projects and, uh, uh, solutions like port works, things become really easy for data scientists. Uh, and they can accelerate the the space with which they are building building out their models.
06:51
What are some of the benefits of using tools like cube flow and solutions like port works for building these machine learning pipelines? Right, So let's talk about some benefits. The biggest benefits that we saw were self-service. So, like, data scientists don't have to worry about opening up tickets. Uh, the dynamic volume provisioning that port works brings to the table.
07:13
And again, this doesn't have to be like this. Whenever port provisions a volume, it automatically follows the policies that are set by administrators. So you don't have The administrators don't have to worry about going back and modifying some things. Uh uh, on an individual volume level because these model training jobs are longer running jobs.
07:32
Uh, we also want to make sure that infrastructure is resilient to failures. So if a node goes down or if, uh, uh, anything happens, like if two nodes can communicate over the network with each other, port work has replicas that are spread across the community cluster. So even if you lose a node, your model can pick up the training process from the last or previous checkpoint that it had stored in that
07:53
persistent volume, and it's ready to go. So model checkpoints are the easy availability of red. Write many volumes. And so, instead of having to, uh, install different plugins or install different storage backends for rewrite ones or rewrite many, you can just use port works as that a single solution that can offer both of these protocol access for your data.
08:12
Scientists Again, data scientists don't have to learn how any of these things work. These are just offered by the platform team as a service. So this is how organisations are and should be building their machine learning pipelines using communities and port works. That's it for this video. Thank you for watching.
  • Video
07/2024
Pure Storage FlashArray//X | Data Sheet
FlashArray//X provides unified block and file storage with enterprise performance, reliability, and availability to power your critical business services.
Data Sheet
5 pages
Continue Watching
We hope you found this preview valuable. To continue watching this video please provide your information below.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.