Skip to Content
Dismiss
혁신
모두를 위한 AI 비전

대규모 환경에서 데이터를 인텔리전스로 전환하는 통합된 자동화 기반의 플랫폼

자세히 알아보기
Dismiss
6월 16-18일, 라스베이거스
Pure//Accelerate® 2026

데이터의 진정한 가치를 실현하는 방법을 알아보세요.

지금 등록하기
Dismiss
2025 가트너 매직 쿼드런트 리포트
실행력 최상위, 비전 완성도 최우수 평가

에버퓨어가 실행력 부문 최상위, 비전 완성도 부문 최우수 평가를 받으며, 2025 Gartner® Magic Quadrant™ Enterprise Storage Platforms 리더로 선정됐습니다.

리포트 다운로드

Betting against Data Gravity:
A Fool's Errand

Par Botes, VP AI Infrastructure, Pure Storage

Actions
4 분

Introduction

By Par Botes, VP AI Infrastructure, Pure Storage

The tech industry seems to have a cyclical fascination with distributed file systems, content distribution, replica management, and global namespaces. These concepts, with varying names, periodically resurface as the "next big thing" in storage. Currently, the buzzword in some areas of systems architecture is “global namespace,” which is making the rounds in IT discussions again—especially in the context of AI—for its promise of seamless data access across geographies.

But is it truly a game-changer for AI, or just another hype cycle in the long history of distributed file systems and content distribution? For business leaders, understanding the evolution of these concepts, the technical realities, and what they mean for enterprise IT strategy is crucial for making strategic storage decisions.

What Is a Global Namespace? It Depends on Who You Ask

Some vendors define a global namespace as a global reporting of information and data types within storage systems. Others define it as the ability to access data from multiple locations as if it were local. Still, others argue that global namespaces essentially mean distributing data to endpoints, a concept previously known as content distribution. 

Over the years, global namespaces have taken on many forms. To understand the history of this technology, let’s look at the grandfather of distributed namespaces: the Andrew File System. Developed at Carnegie Mellon University around 30 years ago, this file system presented itself to clients as a local file system, even when data resided on a different continent. 

It was a remarkable achievement for its time. The Andrew File System employed a sophisticated authentication scheme and a complex locking mechanism to ensure consistency and prevent conflicts arising from simultaneous modifications by multiple users. It was not particularly easy to set up or manage, but there were a handful of relatively large installations back in the day.

Data Gravity and the Evolution of the File System

Shortly after the dot-com bust, the Andrew File System's popularity began to decline. This wasn't due to any inherent flaws but rather a combination of increasing complexity and the rapid rise of thin clients. Analyzing data centrally became more efficient than transferring data across networks. Yes, data gravity reared its head as far back as 20 years ago. Turns out we cannot ship data as fast to remote locations as we can create it.

Additionally, new application frameworks and user interfaces for easy-to-develop, server-side rendering reduced the need for data to leave centralized data centers. 

In the early 2000s, Microsoft entered the distributed file system arena with its Distributed File System (DFS), which later evolved into replicated DFS. While DFS became a mainstream offering, it wasn't as widely adopted as one would have expected from a Microsoft product. Few applications leveraged its capabilities, and it remained somewhat outside the mainstream. Although some users undoubtedly appreciated DFS, it didn't significantly impact broader system architectures globally. 

Then, around 15 years ago, stretch clustering emerged. Initially, this technique aimed to enhance availability by placing data stores at synchronous replication distances. However, some vendors offered active-active configurations, enabling data access from both sides. While stretch clustering still exists, its primary use case is now high availability. Outside of specific industrial applications, its popularity has waned despite some vendors' insistence otherwise.

“It's crucial to remember that data, especially large data sets, has gravity. While WANs are significantly faster than they were 20 years ago…data grows faster than Moore’s Law, and Moore’s Law grows much faster than WAN links.”

Object Stores Address the Issue of Scale

Parallel to these developments, object stores and the S3 protocol gained prominence. This approach eliminated the strict requirements of the POSIX standard for read and write behaviors, which was one of the main challenges in making distributed file systems work and scale. 

Object stores offered greater scalability both in and across data centers. The S3 protocol's inherent location independence simplified data access from clients. As more applications are natively written to access data using S3 APIs, it’s become a dominant force in the cloud and increasingly more so in enterprise data centers. 

Recently, some vendors have attempted to redefine the concept of global namespaces. Certain vendors claim to offer global namespaces by presenting views of aggregated file system metadata. However, generating reports on a central index of filenames doesn't truly constitute a global namespace. Other vendors are revisiting ideas of establishing locking authorities and delegated locks (often referred to as leases) combined with various caching strategies. This approach is reminiscent of Lustre's innovations from 15 years ago, albeit with highly polished marketing and little else being new under the sun. 

Whether this approach will move beyond niche use cases and gain widespread adoption in meaningful applications remains to be seen. I’m a bit skeptical; we still create data much faster than we can transfer it over wide-area network (WAN) links.

Data Gravity May Decide What’s Next

It's crucial to remember that data, especially large data sets, has gravity. While WANs are significantly faster than they were 20 years ago, data volumes have grown at an even faster pace. The rate of data growth far outstrips the expansion of network links. Data grows faster than Moore’s Law, and Moore’s Law grows much faster than WAN links. Even if network links could keep pace with Moore’s Law, at some point the CAP theorem becomes a limit to scale. Consistency, Availability, and Partition tolerance becomes really hard to handle at scale and across inter-site data center links. Some very specific applications, typically read-only or in the content distribution niche can handle this; most real applications can’t.

The challenge of moving data to data centers across distances invariably leads to a battle against network limitations. Physics imposes constraints that are difficult to overcome, regardless of what you call your data prefetching technique. While AFS-like global namespaces with old-school opportunistic locking semantics might be suitable for niche applications like data distribution, their potential to become the dominant computing paradigm is implausible. 

This doesn’t mean I don't think highly of ideas like replication of data for protection or distributed erasure coding in availability zones for object storage resiliency. These ideas work well and belong in a special category of recovery techniques, but they aren’t exactly what people mean when they say global namespaces.

Content distribution itself is a niche use case, and with the increasing prevalence of dynamically generated content by GPUs, the role of storage in this context diminishes. It is true that GPUs are a scarce commodity and moving data to where a GPU is may be a required Band-Aid for supply chain constraints, but the GPU scarcity will be short-lived enough that it’s unlikely to lead to a meaningful and lasting architectural change. 

One of the most intriguing distributed systems with a compelling namespace in recent years is Google's Spanner. Google, having the advantage of building applications from scratch, addressed some of the most challenging problems in distributed storage. They recognized that many queries could be answered using older data. Consequently, they designed their storage system to be queryable at any point in time. This innovative approach allowed applications to determine whether to wait for the most current data or utilize older data to answer queries. While this is a remarkable technique, only a few companies globally possess the resources to build and maintain such a system since the application must be modified to interact with the system; traditional read/write semantics doesn’t work in such a system. 

Moore's Law continues to drive advancements in GPUs, the most data-intensive processors available, and as a result, we're only at the beginning of exploring the possibilities of GPU computing in enterprise data centers. As GPUs become the dominant computing form, we will either recompute the data directly, or data gravity will make it so that the compute resource will execute close to where data lives. 

Despite industry buzz around global namespaces, history has shown that data gravity, network limitations, and consistency challenges impose real barriers to widespread adoption. While niche applications may benefit, enterprise leaders should remain skeptical of solutions that claim to eliminate fundamental storage constraints. Instead, the real shift to watch is how compute resources move closer to data—whether through GPUs or architectural shifts that prioritize locality. In this era, just like in past eras, placing compute resources in proximity to storage will remain the dominant architecture—betting against data gravity is a fool's errand.

Actions
4 분

We Also Recommend

지원하지 않는 브라우저입니다.

오래된 브라우저는 보안상 위험을 초래할 수 있습니다. 최상의 경험을 위해서는 다음과 같은 최신 브라우저로 업데이트하세요.

Personalize for Me
Steps Complete!
1
2
3
Personalize your Everpure experience
Select a challenge, or skip and build your own use case.
미래를 대비한 가상화 전략

모든 요구 사항에 맞는 스토리지 옵션.

모든 규모의 AI 프로젝트 지원

데이터 파이프라인, 교육 및 추론을 위한 고성능 스토리지

중요한 데이터 손실을 사전에 방지하세요.

비즈니스 리스크를 최소화하는 사이버 복원력 솔루션

클라우드 운영 비용 절감

Azure, AWS 및 프라이빗 클라우드를 위한 비용 효율적인 스토리지.

애플리케이션 및 데이터베이스 성능 가속화

로우 레이턴시 스토리지로 애플리케이션 성능을 극대화하세요.

데이터센터 전력 및 공간 사용량 절감

리소스 효율을 극대화하는 스토리지로 데이터센터 활용도를 최적화

Confirm your outcome priorities
Your scenario prioritizes the selected outcomes. You can modify or choose next to confirm.
Primary
Reduce My Storage Costs
Lower hardware and operational spend.
Primary
Strengthen Cyber Resilience
Detect, protect against, and recover from ransomware.
Primary
Simplify Governance and Compliance
Easy-to-use policy rules, settings, and templates.
Primary
Deliver Workflow Automation
Eliminate error-prone manual tasks.
Primary
Use Less Power and Space
Smaller footprint, lower power consumption.
Primary
Boost Performance and Scale
Predictability and low latency at any size.
What’s your role and industry?
We've inferred your role based on your scenario. Modify or confirm and select your industry.
Select your industry
Financial services
Government
Healthcare
Education
Telecommunications
Automotive
Hyperscaler
Electronic design automation
Retail
Service provider
Transportation
Which team are you on?
Technical leadership team
Defines the strategy and the decision making process
Infrastructure and Ops team
Manages IT infrastructure operations and the technical evaluations
Business leadership team
Responsible for achieving business outcomes
Security team
Owns the policies for security, incident management, and recovery
Application team
Owns the business applications and application SLAs
Describe your ideal environment
Tell us about your infrastructure and workload needs. We chose a few based on your scenario.
Select your preferred deployment
Hosted
Dedicated off-prem
On-prem
Your data center + edge
Public cloud
Public cloud only
Hybrid
Mix of on-prem and cloud
Select the workloads you need
Databases
Oracle, SQL Server, SAP HANA, open-source

Key benefits:

  • Instant, space-efficient snapshots

  • Near-zero-RPO protection and rapid restore

  • Consistent, low-latency performance

 

AI/ML and analytics
Training, inference, data lakes, HPC

Key benefits:

  • Predictable throughput for faster training and ingest

  • One data layer for pipelines from ingest to serve

  • Optimized GPU utilization and scale
Data protection and recovery
Backups, disaster recovery, and ransomware-safe restore

Key benefits:

  • Immutable snapshots and isolated recovery points

  • Clean, rapid restore with SafeMode™

  • Detection and policy-driven response

 

Containers and Kubernetes
Kubernetes, containers, microservices

Key benefits:

  • Reliable, persistent volumes for stateful apps

  • Fast, space-efficient clones for CI/CD

  • Multi-cloud portability and consistent ops
Cloud
AWS, Azure

Key benefits:

  • Consistent data services across clouds

  • Simple mobility for apps and datasets

  • Flexible, pay-as-you-use economics

 

Virtualization
VMs, vSphere, VCF, vSAN replacement

Key benefits:

  • Higher VM density with predictable latency

  • Non-disruptive, always-on upgrades

  • Fast ransomware recovery with SafeMode™

 

Data storage
Block, file, and object

Key benefits:

  • Consolidate workloads on one platform

  • Unified services, policy, and governance

  • Eliminate silos and redundant copies

 

What other vendors are you considering or using?
Thinking...
Your personalized, guided path
Get started with resources based on your selections.