00:00
Hi everyone. My name is Kaycee Lai, vice president of AI at Pure Storage. Really excited to have in the studio with me today, Jacob Lieberman from NVIDIA. Jacob. Very, very nice to see you again. Thank you. Yes.
00:14
Would you mind telling us a little bit about yourself and what you do in the video? Yeah. So my name is Jacob Lieberman, and I'm a director of enterprise product at NVIDIA. And a new initiative we have called the AI Data Platform.
00:27
Very cool, very exciting. Well, Jacob, the first thing I want to talk about is I think everyone, when they think I, they think the coolest copilots ride, the biggest, baddest models. And they often forget about data.
00:38
And so I love to kind of talk to you about that. Why do you think data matters for AI? What's your perspective? Well, this is one of my favorite subjects, Kaycee. So yeah, there's been this massive rush of enthusiasm
00:50
around gen AI and all of its capabilities. But somewhere in the midst of all that, we lost sight of the fact that data is still king. So whether you're training a model, fine tuning a model, or retrieving a additional context
01:06
through RAG to inform your LM generations, you need secure access to high quality data. Basically, you don't want the garbage in garbage out problem, right? Crazy hallucinations. Right? Yes, that's still true. All right. Right. Well, then.
01:21
And in addition to that, I think the performance also matters, right? Because you need to make sure the data gets the GPUs quickly enough so that where you don't have idle GPUs. Nobody likes that. Right. So without GPUs it's really not possible
01:36
to prepare data for AI at scale and to keep up with the velocity of the data, the rate that the data changes and the rate that the data grows. So that's number one. Number two you know building these pipelines to make data
01:52
AI ready are complex. And they have many stages. And there's many handoffs between the different personas and users of the data. And at any moment during any one of those handoffs somebody could drop the ball totally.
02:06
Yeah, I think I think I see this as probably one of the biggest challenges that's getting in the way from an AI inference perspective. You know, up to now, most of the workloads has been training. And so people have been really focused on that. But now it's going to shift where most of the time,
02:22
effort and money is actually all going to be focus on inference. And so the reason why that's interesting is because what you just said, the minute you get to inference, right, you can only get good inference and good consumption if the data is actually AI ready. Well, there's many challenges. I mean, first of all,
02:39
data enterprise data is unstructured. 90% of the data that an enterprise requires is unstructured in nature. And there's many modalities video, audio, text, PDFs with graphics, images, presentation, spreadsheets combine those things.
02:59
It becomes quite challenging to really extract insight from the data, right? Well, if you can't even get the data to be AI ready, you're not getting any insights, right? Right. I think that's that's definitely key. And so I think that's
03:11
why like we're very excited in terms of what we're doing at Pure Storage. We announced at GTC last week the introduction of a new product called Pure Storage Data Stream, where we are specifically focused on this challenge. So first part that Data Stream is going to do is address that specific area. So that way you get one workflow, one product that's going to automate
03:35
the entire process to actually generate data sets for AI in minutes. Second thing we're going to do is we're going to make sure it's super easy to consume the output. Put some governance around them. Right. What they should use, what they should not use,
03:51
who can see who can not see those types of things should be in there. And then third is we're entering an age where we have agents, right. Agents are part of our digital workforce. So you have to now think about how agents are going to consume. I think these are very, very important capabilities in data stream,
04:07
right, to accelerate and simplify the process of making data AI ready. Right. So you can think about Pure Storage as really taking an active role to to be there for the, for the customer for every step of their AI journey. And what I love about it is that it's all centered
04:24
around the data, which is really the core competency. The Pure Storage is protecting that data, and then it builds on top of that. But the data is always at the core. Jacob, it was a pleasure having you in the studio. Yeah. Thank you. Thank you so much for doing this.
04:39
I really enjoyed it. Had a blast. Yeah. Me too.