BlogCompany

I had the good fortune to catch (well, via webcast anyway) Dave Hitz, NetApp founder and EVP, speaking from the GigaOM Big Data conference. I was particularly intrigued by the last part of the discussion (beginning at 14min 45sec). When asked about the performance mismatch between CPU and disk, Dave made the point that this was old news indeed: (I’m paraphrasing) If you go back to the 1960s when mainframes and tape were king, the ratio of CPU performance to seek time on the magnetic tape drive was closer than it is between today’s CPU and 15K performance disk. Flash was then proposed as the answer to closing this gap. Like Dave, we’re convinced.

Data center storage has failed to keep pace with servers, which have gotten faster, denser, and cheaper following Moore’s law for decades. Storage has certainly been getting denser and cheaper as we’ve learned how to pack more data onto each hard drive, but performance is headed in the opposite direction. Mechanical disk seek time and rotation speed are constrained by Newton’s Laws, so performance in terms of IOPS per GB has actually been falling, thus creating an ever-greater imbalance between storage and the rest of the data center.  As we have already pointed out in this blog, flash affords dramatically more random I/O per GB vs. disk, and due to virtualization and cloud computing an ever-greater share of data center I/O is random.

Dave Hitz goes on to say, “Everything that we saw in the last twenty years of transition from tape to disk, exactly the same evolution is going to occur, except just substitute disk is the new tape, and flash is the new disk.” (To give credit where credit is due, Jim Gray was—as usual—well ahead of the rest of us. Check out his 2006 talk entitled Tape is dead, Disk is tape, Flash is disk, RAM locality is king.)

What was not discussed, at least at the GigaOM conference, were the barriers to getting this transition right. Remember Hierarchical Storage Management (HSM) and Virtual Tape Libraries (VTL)? HSM is reminiscent of today’s intra-array tiering. While HSM sounded great in the slideware, in practice HSM implementations were complex to manage and suffered from orders-of-magnitude latency disparities between the tiers, particularly in the face of conflicting workloads. VTL was an effort to preserve industry investment in infrastructure for tape backups even as the media was replaced with hard drives. As such, VTL is arguably akin to today’s substitution of a flash SSD for a hard drive within a traditional storage array without re-architecting the storage controller hardware and software to take better advantage of that SSD.

The interesting learning from the tape/disk transition wasn’t that disk replaced tape, it was that disk redefined the role of tape.  Tape used to be used for all things backup and archiving: performing backups, performing recoveries, archiving data and moving data offsite via trucks.  When disk entered the backup scene, backup changed and the roles of the media changed. Disk is now used for recovery and local retention; replication plus disk is now used for moving data offsite; and tape is still used (albeit in a smaller role) for long-term retention and archiving.

My belief is that we’ll see something similar in the flash/disk transition.  Flash won’t replace disk outright (how many years would the flash fabrication plants have to run at full tilt to produce enough storage?), but the days of disk being used for all forms of online storage (Tier 1 performance, Tier 2 capacity, Tier 3 retention) are numbered. In the future you’ll see flash redefine the role of disk: flash will be used to deliver performance in online storage and disk will be used to deliver capacity. And just like the transition to disk forced a change in the backup infrastructure and processes, flash will force major changes within primary storage architectures.

So yes, we have seen this sort of transition before, and hence should know to expect some intense debate and experimentation to determine which storage architectures will deliver the most bang for the buck for which application workloads. Is an incremental approach that preserves industry investment in mechanical disk going to work best?  Or will it take an architectural rethink to deliver on the promise of flash within data center storage?