Top of Page

Why Flash Changes Everything, Part 3

I have been justly accused of belaboring the top ten list format in the past, but I’ve reached new lows by starting one in March (Why Flash Changes Everything, Part 1), continuing it in May (Part 2) only to now finally finish it in August. In my defense, Part 3 of Why Flash Changes Everything was too juicy to share in advance of our company launch today.

Without further adieu, the #1 reason Why Flash Changes Everything is       

(1) High-performance data reduction.

Let me explain. Data reduction techniques like deduplication and compression have been around disk storage for many years, but so far have failed to get traction for performance-intensive workloads. Why?

Deduplication effectively entails replacing a contiguous block of data with a series of pointers to pre-existing segments stored elsewhere. On read, each of those pointers represents a random I/O, requiring many disks to spin to fetch the randomly-located pieces of data to construct the original data. Disk is hugely inefficient on random I/O (wasting >95% of time and energy on seeks and rotations rather than data transfers).  Better to keep the data more contiguous, so that disk can stream larger blocks on/off once the head is in the right place.

There is also the challenge of validating duplicates on write. If hashing alone is used to verify duplicate segments, then a hash collision, albeit very low probability, could cause data corruption. For backup and archive datasets, such risk is more acceptable than it is for primary storage. In our view, primary storage should never rely on probabilistic correctness. This is why Pure Storage always compares candidate dedupe segments byte for byte before reducing them, but this too leads to random I/O.

Since deduplication depends upon random I/O, with disk it inevitably leads to spindle contention, driving down throughput and driving up latency. No wonder dedupe success in disk-centric arrays has been limited to non-performance intensive workloads like backup and archiving.

With flash there is no random access penalty. In fact, random I/O may be even faster as it enlists more parallel I/O paths. And with flash writes more expensive than reads, dedupe can actually accelerate performance and extend flash life by eliminating writes.

Compression presents different challenges. For a disk-array that does not use an append-only data layout (e.g., the majority of SANs rely on update in place), compression complicates updates: read, decompress, modify, recompress, but now the result may no longer fit back into the original block. In addition to accommodating compression, append-only data layouts (in which data is always written to a new place) are generally much friendlier to flash as they help avoid flash cell burn-out by amortizing I/O more evenly across all of the flash. (Placing this burden instead on an individual SSD controller leads to reduced flash life for those SSDs with hot data.)

Finally, flash is both substantially faster and more expensive than disk. For backup, data reduction led to a media change—swapping disk for tape—by making disk cost-effective. Over the next decade the same thing is going to happen in primary storage. Pure Storage is routinely achieving 5-20X data reduction on performance workloads like virtualization and databases (all without compromising submillisecond latency). At 5X data reduction, the data center MLC flash we use hits price parity with performance disk (think 15K Fibre Channel or SAS drives). At 10X, which we routinely approach for our customer database workloads, we are about half the price of disk. 15X, a third. And at 20X, which we typically approach for our customer’s virtualization workloads, we are roughly one quarter the cost!

In retrospect, then, high-performance data reduction is an obvious #1 reason Why Flash Changes Everything: Flash provides the additional IOPS capacity necessary to make data reduction feasible for performance workloads. And in turn, data reduction is going to enable solid-state flash to replace mechanical disk for literally all of the random I/O performance workloads in the data center by making it cost effective to do so.

With flash faster, more space and power efficient, more reliable, and cheaper than disk, why buy disk?

About the Author

Scott Dietzen is the CEO of Pure Storage and a three-time successful entrepreneur with WebLogic, Zimbra, and Transarc.

  • http://www.purestorage.com/blog/off-to-the-races/ Pure Storage is off to the Races! | Pure Storage Blog

    [...] disruptive solid-state flash memory will prove to be for data center storage (see Why Flash and the Top Ten Reasons Why Flash Changes Everything). We’ve been quietly building our product and testing it with innovative customers in some of the [...]

  • Ian Ringrose

    That fact that data reduction enable flash to be used, could be the tipping point for moving to lots more virtualization workloads, or at least the removal of disk drives from lots of PCs.  Accessing a deduplication flush storage “brick” over a LAN, may become faster and more cost effective then local disk.

  • http://www.purestorage.com/blog/breaking-the-flash-cost-barrier-talk/ Breaking the Flash Cost Barrier Talk | Pure Storage Blog

    [...] discussion thereafter about the challenges of deduplication on mechanical disk (discussed in this blog entry); the challenges with managing flash workloads over time—in another forum tech blogger Amy [...]

  • http://rogerluethy.wordpress.com/2011/10/27/breaking-the-flash-cost-barrier-talk/ Breaking the Flash Cost Barrier Talk « Storage CH Blog

    [...] discussion thereafter about the challenges of deduplication on mechanical disk (discussed in this blog entry); the challenges with managing flash workloads over time—in another forum tech blogger Amy [...]

  • http://www.purestorage.com/blog/auspicious-times-for-flash-in-the-data-center/ Auspicious Times for Flash in the Data Center | Pure Storage Blog

    [...] point of the one they didn’t (tape). Well, for performance storage, flash is the new disk and data reduction done right allows all-flash solutions to be price competitive with mechanical [...]

  • http://www.purestorage.com/blog/are-flashdisk-hybrids-just-hsm-2-0/ Are Flash/Disk Hybrids Just Hierarchical Storage Management 2.0? | Pure Storage Blog

    [...] Ours and Forrester’s thesis, then, is that dedupe and compression will do the same for flash in performance storage that they did for hard drives in backup and archiving—enable a faster but more expensive media to be cost competitive with a slower, cheaper one. With the 5-10X deduplication and compression ratios that Pure Storage has seen for our customers’ virtualization and database workloads, you really can get all flash storage at below the price you have been paying for enterprise disk arrays of 15K hard drives (and that’s without any flash cache)! These savings from data reduction cannot easily be extended to mechanical disk, because deduplication is random I/O intensive, for which disk is >95% inefficient. [...]

  • http://www.purestorage.com/blog/xtrem-thunder-in-the-forecast-for-emc/ “Xtrem” Thunder in the Forecast for EMC | Pure Storage Blog

    [...] requires a comprehensive redesign of the array hardware and software. (For more on this, see our Top Ten Reasons Why Flash is Different.) For years, EMC has been selling flash cache and tiers as performance accelerators for their [...]

  • http://www.purestorage.com/blog/should-you-really-buy-flash-from-baskin-robbins/ Should You Really Buy Flash from Baskin Robbins? | Pure Storage Blog

    [...] they’d be able to evolve the Symmetrix product line to be the all-flash platform, but for reasons we’ve covered in depth before flash requires a new architecture…so let’s say that’s just not possible.  In this case, they [...]

  • http://www.purestorage.com/blog/40m-for-honing-the-keys-to-the-flash-storage-kingdom/ $40m for Honing the Keys to the Flash Storage Kingdom | Pure Storage Blog

    [...] most out of flash requires a complete rewrite of the storage software. As we’ve remarked before, flash changes everything. Algorithms and data structures are optimized radically differently for flash memory than they are [...]

  • http://www.purestorage.com/blog/software-separates-the-all-flash-array-winners-from-the-losers/ Software Separates the All-Flash Array Winners from the Losers | Pure Storage Blog

    [...] storage will be no different. As we have remarked before flash memory is so radically different from hard drives that it requires holey new software [...]