Skip to Content
Dismiss
Innovation
A platform built for AI

Unified, automated, and ready to turn data into intelligence.

Find Out How
Dismiss
June 16-18, Las Vegas
Pure//Accelerate® 2026

Discover how to unlock the true value of your data. 

Register Now
Dismiss
NVIDIA GTC San Jose 2026
Experience the Everpure difference at GTC

March 16-19 | Booth #935
San Jose McEnery Convention Center

Schedule a Meeting

AI’s Challenge to Data, Storage, and the Computing Industry

Artificial intelligence has created a fundamental change in the nature and architectural importance of data. To deliver on AI’s promise, we’ll need to find new discoveries in the data we already have.

Actions
4 min. read

Introduction

By Par Botes, VP AI Infrastructure, Everpure

A generational change is hitting the multitrillion-dollar global enterprise computing industry. Almost no one is talking about it, and few are coming to grips with what it will mean. 

Artificial intelligence (AI) has created a fundamental change in the nature and architectural importance of data.

AI has captured the imagination of so many sectors. There are code-slinging IDE-extensions, poetry writing robots, dazzling and impossible images, elaborate videos, and music disconnected from reality. There are math-solving tools that look promising for finding proofs for problems that have long vexed the greatest minds, and language models that could reveal the deep language of biology. 

No matter where you look, though, this is the change: Before, if there was a bad output in computing, you checked the code for bugs. Now, with AI, if there’s a bad output, you don’t check the code—you check the data. 

Data is both the source and the problem. How we depend on it will change the way we think about computers, programs, testing, and reliable execution.

The Source of Truth Is at the Center of the Process

This shift in data dependency implies a technological (and legal) sea change with few parallels in computer engineering history. Before, data was simply something used by code. Now, training data is a foundational source of truth for what the code will do. 

In current models for building AI systems, many different kinds of data need to be clearly identified and tracked, made auditable, and put into a repeatable format that can be analyzed at a fast and regular cadence. Upon training with new data, new insights from the new data have a direct influence on the success of every AI exercise. 

The metadata to this data is also becoming disproportionately more valuable.

A Generational Challenge

Because the identity of a single piece of data is ever-changing, depending on the context in which it’s being used, it’s essential that it be clear in its identity, despite shifting contexts. Great AI outcomes happen when the data aligns with clarity and rigor. 

The workflows of data are normalizing and beginning to look like those used by regulated industries, such as audited financial data, which is set and unchanging by a fixed period in time, like the end of a financial quarter. Another is workflows for testing new drugs or food quality. That data is clearly derived and labeled by commonly understood standards and audited via artifacts and evidence so it can be used reliably.

Reliable and uniform data is why AlphaFold, Google DeepMind’s tool for determining protein structures, is one of the most successful AI projects around. Lab data from around the world is uniform, with every person involved agreeing on what definitions mean, so that the descriptions—sometimes called labels—are uniform. The goal is ambitious, but it’s also relatively narrow so the data doesn’t have to be repurposed from other sources to fit other needs.
Those examples are single uses, though. Data has to have that same quality of identity and provenance in all other different training and usage contexts. While regulatory and financial data is in a process that happens over months, what we’re talking about in this era is checking this kind of data and its interactions near instantaneously.

Regulatory agencies, financial departments, and specialized labs aren’t the norm. Most of the world’s digital information is created according to any number of standards, indexed in various ways, and stored in a multitude of formats. The majority of older data was created before auditable labels were a consideration. 

Some vendors think merely having storage for data and an index is the answer. That is a fallacy. Having a structured method to describe the data and track changes to the data and the index is both the problem and the value. 

That’s why this is a new, generational challenge to how we’ve previously thought about data.

“Some vendors think merely having storage for data and an index is the answer. That is a fallacy. Having a structured method to describe the data and track changes to the data and the index is both the problem and the value.”

A Metaphor: The Bank Heist and the Patent

To give you a better sense of the problem, consider the implications of a French bank heist that occurred in 1890.
A thief broke into the bank and went to work on the safe with a torch that used methane from the bank’s gas-powered lighting and liquid oxygen he’d brought along. After a couple of hours, he’d cut a 12x20-inch rectangle in the iron safe, only to find the safe was double-hulled. Without enough oxygen to cut a second hole, he took off.

Fast forward 20 years and a new way to cut iron was developed. The inventor realized that, along with heat, a more direct application of oxygen to the surface created an iron oxide akin to rust. A torch cutting through brittle rust along with iron is much faster, a breakthrough that inspired the developers to seek a patent for their innovation.

But, was it really novel enough for a patent? Someone remembered the theft from 20 years earlier, proving that the use of oxygen to break into the safe may preclude the patent. They found the damaged safe still in an evidence locker, saw no trace of rust, and the new method was granted a patent.

Think of that iron rectangle as a single data point. In 1890, the presence of rust was not yet relevant. That piece of metadata (“the condition of metal”) only mattered later when we understood the science, even if it had existed the whole time. 

That’s the new reality not for one piece of data, but for trillions, in repositories around the world. The ability to go back and reexamine data in a new context creates new insights and new value. AI accelerates this paradigm shift beyond anything we could imagine in the prior big data era.

Metadata: How We’ll Build the Future

The real promise of the Age of AI, beyond automation, is the new discoveries we’ll find in the data we already have. Practitioners will realize the core need to create and attach more metadata for existing data, some not yet known, for an infinitude of data points. 

There’s never been anything like this in the generations of enterprise technologies that came before. 

The need to revisit data and continuously extract more knowledge and insights to improve AI is unique for our time. This makes data lineage, tracking, and indexing—the discipline known as metadata—grow in both value and scale. Metadata is no longer just a method to accelerate data lookup; it’s become a true “master catalogue” of data. 

Some may throw their hands up at the sheer scale of the problem, but the tech industry is full of talented people who delight in tough problems of scale. Disruptive thinkers always show up when the needs are the greatest.

Consider the unstructured data revolution from big data’s rise 20 years ago. It’s going through something of a dramatic evolution today. Instead of treating data like amorphous blobs, even highly unstructured storage systems have gained the capability to organize unstructured data into structured forms. The winner is looking much like tabular formats with highly flexible transformations of how data is evolved and related. The “lazy evaluation” strategy of programming, in which primitives are treated as abstractions, is being mined for its applicability to maintaining data that is reliable and standardized, according to need. 

We know Python is the universal language of AI, and one of the most widely used data structures in the data science and AI discipline is the Pandas DataFrame. I have followed the team over at Pixeltable (of Parquet file format fame) who have looked at the data problem and made the dataframe a super flexible data structure. I really like the door this opens up, making multimodal data sets something that can be flexibly stored, transformed, and iterated on, in dependable ways. The world needs more flexible methods for organizing and querying data at scale, and fast search through columns just won’t be enough. 

In my work, I'm thinking about extending these types of concepts with even more transformations, lineage, and scale versus what we previously thought was possible. At the core of it, what I truly like is how the data morphs depending on caller needs, decoupling its creation and administration from its use. Transforming data into new forms at access time drives developer productivity, where an MPEG image is transformed to JPEG upon access, if that is implicit in the needs of the code.

I’m following this project, and others like it, very closely. While solutions from the big data era do offer scale, they’ll need significant innovation to be anything like the model for future generations of data organization and storage.

To Succeed in AI, Stay Vigilant about Data

In an earlier post, I talked about the healthy rise of brute force computing and how new AI models change convention. I suspect the need for more computation will be as big a part of the future as will emerging storage capabilities like flexible data transformations, tracking, and indexing.

I leave you with this: Enterprises cannot lose focus on data availability and performance. These are table stakes for today and the future. AI introduces new demands for data freshness and data quality, which creates new needs for data representation, tracking, and indexing, as these areas mature. 

At Everpure, we’re laser-focused on our mission to deliver best-in-class products for the all-flash data center. We are well-placed to deliver and working hard on building visionary products that incorporate new concepts for data flows.

I personally love the fresh challenges and opportunities in AI and data set management, and I can’t wait to explore innovations in these new areas in future posts.

Actions
4 min. read

We Also Recommend

Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.

Personalize for Me
Steps Complete!
1
2
3
Personalize your Everpure experience
Select a challenge, or skip and build your own use case.
Future-proof virtualization strategies

Storage options for all your needs

Enable AI projects at any scale

High-performance storage for data pipelines, training, and inferencing

Protect against data loss

Cyber resilience solutions that defend your data

Reduce cost of cloud operations

Cost-efficient storage for Azure, AWS, and private clouds

Accelerate applications and database performance

Low-latency storage for application performance

Reduce data center power and space usage

Resource efficient storage to improve data center utilization

Confirm your outcome priorities
Your scenario prioritizes the selected outcomes. You can modify or choose next to confirm.
Primary
Reduce My Storage Costs
Lower hardware and operational spend.
Primary
Strengthen Cyber Resilience
Detect, protect against, and recover from ransomware.
Primary
Simplify Governance and Compliance
Easy-to-use policy rules, settings, and templates.
Primary
Deliver Workflow Automation
Eliminate error-prone manual tasks.
Primary
Use Less Power and Space
Smaller footprint, lower power consumption.
Primary
Boost Performance and Scale
Predictability and low latency at any size.
What’s your role and industry?
We've inferred your role based on your scenario. Modify or confirm and select your industry.
Select your industry
Financial services
Government
Healthcare
Education
Telecommunications
Automotive
Hyperscaler
Electronic design automation
Retail
Service provider
Transportation
Which team are you on?
Technical leadership team
Defines the strategy and the decision making process
Infrastructure and Ops team
Manages IT infrastructure operations and the technical evaluations
Business leadership team
Responsible for achieving business outcomes
Security team
Owns the policies for security, incident management, and recovery
Application team
Owns the business applications and application SLAs
Describe your ideal environment
Tell us about your infrastructure and workload needs. We chose a few based on your scenario.
Select your preferred deployment
Hosted
Dedicated off-prem
On-prem
Your data center + edge
Public cloud
Public cloud only
Hybrid
Mix of on-prem and cloud
Select the workloads you need
Databases
Oracle, SQL Server, SAP HANA, open-source

Key benefits:

  • Instant, space-efficient snapshots

  • Near-zero-RPO protection and rapid restore

  • Consistent, low-latency performance

 

AI/ML and analytics
Training, inference, data lakes, HPC

Key benefits:

  • Predictable throughput for faster training and ingest

  • One data layer for pipelines from ingest to serve

  • Optimized GPU utilization and scale
Data protection and recovery
Backups, disaster recovery, and ransomware-safe restore

Key benefits:

  • Immutable snapshots and isolated recovery points

  • Clean, rapid restore with SafeMode™

  • Detection and policy-driven response

 

Containers and Kubernetes
Kubernetes, containers, microservices

Key benefits:

  • Reliable, persistent volumes for stateful apps

  • Fast, space-efficient clones for CI/CD

  • Multi-cloud portability and consistent ops
Cloud
AWS, Azure

Key benefits:

  • Consistent data services across clouds

  • Simple mobility for apps and datasets

  • Flexible, pay-as-you-use economics

 

Virtualization
VMs, vSphere, VCF, vSAN replacement

Key benefits:

  • Higher VM density with predictable latency

  • Non-disruptive, always-on upgrades

  • Fast ransomware recovery with SafeMode™

 

Data storage
Block, file, and object

Key benefits:

  • Consolidate workloads on one platform

  • Unified services, policy, and governance

  • Eliminate silos and redundant copies

 

What other vendors are you considering or using?
Thinking...
Your personalized, guided path
Get started with resources based on your selections.