Top of Page

How Much Flash Storage Capacity are You Really Getting?

With flash arrays and appliances, usable capacity calculations vary wildly by product architecture. How much you should increase or decrease your capacity are fundamentally different based on the presence and implementation of RAID, flash management, storage virtualization, thin provisioning, dedupe, and compression. This blog post explains the key factors to consider when calculating the capacity of a FlashArray, and explains how you can calculate your effective capacity using a new tool called the PureSize estimator.

Raw Capacity is a Useless Metric

While raw capacity is the least contestable metric for comparison, one could argue it is also the least useful because it doesn’t tell you anything about the efficiency of the architecture in question.  Just because you *think* you can count the bits doesn’t mean you can use them to store your data.  In reality, you probably aren’t getting an accurate count anyway – take for instance the fact that eMLC SSDs provide as much as 25% more flash storage but SSD providers don’t include it in their capacity statements because it’s used to offset additional cell wear (it does, however, consume power and cost money).

Storage array architectures either make flash storage more efficient or less, in both the capacity and performance dimensions.  As you fill up the array, how far can you go before your read and write bandwidth starts to decline? For some flash appliances today, to get the performance you paid for, you need to keep the array below about 70% full. Add RAID and garbage collection to the calculation and you start to pay a hefty price in degraded capacity.  By comparison, other arrays claim they need about 18% of their array for “overhead”. There are also technologies such as global inline deduplication that improve the efficiency of storage.  But that’s not all…

Different Workloads Get Different Results

PureSize_Questions

With thin provisioning, deduplication, and compression, the effect on a particular workload varies a great deal. It’s why we use ranges to refer to our data reduction results at Pure.  Each reduction feature responds to a data profile differently.  While virtual machines dedupe at an amazing rate, databases get their best reduction boosts from compression and pattern removal.  Any dataset that’s already compressed in some way is likewise going to reduce less effectively.

Effective Capacity of Flash Architectures

Size Your Flash Array with a Real Calculation

To really know how much flash storage you are getting, look for the conditions for effective capacity across these different architectures.  In many cases, usable gigabytes will be lower than raw gigabytes.  With all-flash arrays, data reduction produces the opposite effect, but to a variable amount, so you need a way to calculate it – your specific dataset’s results, not a general statistic.

To make this easy, Pure Storage has released a new tool called the PureSize effective capacity estimator.  It simulates the savings from deduplication, compression, pattern removal, and (optionally) thin provisioning, and also adds in the additional overhead consumed by RAID, metadata and flash management, giving you an accurate estimate of how much of YOUR data you can store on a Pure Storage FlashArray. It gives you a complete view of the factors that affect the usable capacity, including for instance a calculation of the storage needed for the metadata for your data set.

Here is a look at the output of the estimator:

PureSize Data Sizing Summary

As you can see, this example data set reduced to 21.29% of its original size.  This particular data set thin provisioned well, and compressed well, but didn’t do as well with deduplication.  Other data sets have the exact opposite profile.  This calculation is then used to assess how much data can be stored on each configuration of the FlashArray:

Effective capacity example

Otherwise said, “I can store 24.2TB of data on my 5.5TB flash shelf”.  There are two effective capacity calculations:

  • Data-Only Method: assumes thin provisioning is used on the incoming data set; it ignores large blocks of zeros in its calculation of the original data set’s size.
  • Data + Zeros Method: assumes thin provisioning is not used and calculates the effective capacity based on the entire provisioned storage space.

By downloading the PureSize estimator, it’s easy to determine what your price per gigabyte will be when you load real data onto the FlashArray.  While you can choose to run a subset, we recommend running the largest data set possible because the larger the dataset, the more accurate results will be. Once you know which FlashArray configuration meets your needs, you can compare the effective capacity to those of other arrays or appliances.

About the Author

Luanne Dauber is Director of Marketing at Pure Storage, the all-flash enterprise storage company. Prior to Pure Storage, Luanne spent 12 years with Altera Corporation, where she managed a $1.5B portfolio of advanced semiconductor products used in networking and storage infrastructure products. Luanne held positions in intellectual property, software marketing, and product management at Altera.

  • http://www.purestorage.com/blog/visualizing-the-results-from-puresize/ How to use PureSize to visualize dedupe and data reduction. | Pure Storage Blog

    [...] we begin, you should read @PureLu‘s excellent post on why everyone should run PureSize. It also includes sample output from PureSize, which I [...]