The storage industry has been espousing flash as the “cure-all pill” for application performance problems, but folks who have just thrown flash blindly at their applications have found mixed results at best. Why? It turns out that some applications are very well-suited for flash, while others may see little performance benefit from flash. Some applications will respond well to a flash caching strategy, where for others “all flash” approaches are required to realize the benefit. It all comes down to understanding the I/O profile of your application, something we’ve taken to calling an application’s “I/O fingerprint.” Applications will vary greatly in their I/O fingerprint, and that fingerprint can change over time as the workload changes, or the architecture evolves (for example, virtualizing an application can greatly alter its I/O fingerprint as application I/O gets inter-mixed with other applications on a virtualized server to create an entirely new fingerprint). Let’s explore the I/O characteristics or “fingerprint” that you should understand before diving into storage performance optimization for a given application:
I/O Load and Growth
The first thing to understand is how much I/O your application is doing. How many IOPS (I/Os per Second), and how consistent is that I/O load? Is it constant, or bursty? When it spikes are the spikes brief or sustained? Is there an hourly, daily, weekly, monthly, quarterly, or seasonal variance? What is the annual growth rate of IOPS? Is it relatively constant with capacity growth, or is performance growth out-pacing capacity growth? Through these questions you are trying to get a rough idea of how much performance your storage architecture needs to deliver for your application, and how it is changing over time. Will you run out of performance next quarter or next year?
The second thing to understand is I/O size. Some applications do small block I/O (4K or 8K), while others do large streaming I/Os where MBs of data are transferred at a time. I/O size is important because it colors how you think about many of the other performance metrics…an application doing 1K IOPS with 4K IO size can be relatively modest in terms of bandwidth (4 MB/s), while the same application with a 32K I/O size will drive 8x the bandwidth at 32 MB/s. I/O size has a similar impact on latency…a large I/O may tie-up a storage port for milliseconds at a time simply transferring the data, which might make round-trip latency look quite bad, even if the storage device is performing at top speed. So net net, understanding the I/O block size is key to interpreting whether given measures of IOPS, latency, and bandwidth are good or bad, and when comparing two different applications you must normalize these measures to a common I/O size.
Different storage architectures behave very differently in how they handle read workloads, write workloads, and mixed read/write workloads, and much of the configuration tweaking one does on a storage architecture involves optimization of the device for these access patterns. Storage arrays typically have read caches, so reads are either served out of DRAM or flash cache if stored there (sub-millisecond access times), or served from back-end disk if not (many-millisecond access times). Writes on the other hand, are generally persisted to some form of non-volatile DRAM cache (sub-millisecond acknowledgement times), and then staged back to back-end disk (many-millisecond flush times). What’s important to understand, however, is that both of these caching layers have inherent limits in that they can only handle so much data so fast, and have large penalties in terms of performance when those thresholds are exceeded. Understanding the read/write mix of your application can help determine if caching will help your application or not, and will determine a suitable cache size.
Locality of Access
They say the devil is in the details, and if there is a devil in understanding access patterns, it is locality of access. As discussed above read caching is key to delivering sub-millisecond (solid state) vs. many-millisecond (disk) read service times for I/Os. Fundamentally, however, caching is a probability game, how often you “hit” cache for your read, versus “miss” cache and service the read from disk. Although caching algorithms have gotten smarter and smarter over the years, they basically involve watching access patterns to understand which blocks are accessed frequently, then taking educated guesses to “read ahead” and pre-fetch blocks that are likely to be accessed next. This is a vast over-simplification, but these schemes play on the nature that with some applications access patterns are predictable. On the other hand, with many applications, access patterns are simply not predictable, they are random and our belief is that random I/O is on the rise. Understanding how random your application’s I/O is will help you understand how helpful, or useless DRAM/Flash caching strategies can be.
Finally, you should understand the latency sensitivity of your application. Simply put, some applications will greatly benefit from greater I/O response times, others will not. If your application is gated by complex application logic and storage I/O times aren’t the bottleneck, then improving storage performance won’t help much. On the other hand, if storage I/O transfers dominate your transaction completion time, then improving them will have a dramatic effect on transaction rate and scalability. The other impact to understand is that for most applications, I/Os aren’t independent. At the application tier there is a transaction being executed, which may result in many individual I/Os per transaction. Sometimes that transaction is serial, where I/Os have to complete in a given order, and a slow I/O can stall the entire transaction. All these factors will impact how improvements in storage performance will impact the application. Conversely, if the application is not storage-bound, a 2x improvement in storage performance may have little effect on application performance. On the other hand, if an application is storage-bound, and each application transaction consists of many storage I/Os, a doubling of storage performance can have a highly compounding effect on application performance.
Hopefully this post started to get your wheels turning on what questions to ask when profiling your application and understanding its I/O fingerprint. In future posts we’ll further dissect the I/O fingerprint, and discuss which application I/O fingerprints are good candidates for flash, and which are not.