Descubra por qué Pure Storage ha sido nombrado de nuevo Líder en el Cuadrante Mágico™ de Gartner® de 2022 para Sistemas de Archivos Distribuidos y Almacenamiento de Objetos.

What Is DirectFlash and How Does It Work?

collage speed highway and railways in communication supercomputer with binary code; Shutterstock ID 400031566; purchase_order: 0; job: ; client: ; other: Per Eric C request 11/7

DirectFlash® is Pure's pioneering flash management solution comprising our Purity software and DirectFlash Modules—both components that can be independently and non-disruptively upgraded.

Here’s how it works, why it’s different, and why you need it.

Flash Storage Overview

Invented by Toshiba in 1980, flash memory, also known as flash storage, is a type of non-volatile memory (meaning it doesn’t require a continuous power supply) that can be electronically erased and reprogrammed.

There are two main types of flash memory—NOR and NAND—that differ at the circuit level depending on the type of logic gate they’re using. Currently, NAND flash represents more than 95% of the flash memory market and is used in almost all non-embedded flash devices.

Within the NAND category, there are various types of memory, classified based on the number of bits stored per memory cell, including:

  • SLC: One (single) bit per cell
  • MLC: Two (or multiple) bits per cell
  • TLC: Three bits per cell
  • QLC: Four (quad) bits per cell

DirectFlash is Pure Storage’s holistic approach to building all-flash systems. We leverage “raw” flash to build our DirectFlash Modules, rather than rely on buying commodity solid-state drives (SSDs). By doing this, we get our flash at a different point in the supply chain from other solid-state array vendors. But the benefits of DirectFlash are much more than just better supply chain economics.

How DirectFlash Is Different

Other all-flash or hybrid arrays that use commodity, off-the-shelf SSDs talk to their flash drives in essentially the same way they would a legacy hard drive: like it’s one contiguous set of identical blocks.

Hard drives had tracks and sectors, and laying all those sectors end to end was how you got one long list of blocks. SSDs replicate this same geometry by integrating complex systems in between the system and the flash, called a flash translation layer (FTL).

DirectFlash uses a different approach that talks to flash memory directly, which maximizes the capabilities of flash and provides better performance, power utilization, and efficiency.

Specifically, DirectFlash offers:

  • System-level media management, as opposed to drive-level, which means the drives work in concert with the system itself, allowing the system to:
    • Make smarter data placement decisions based on broader context.
    • Understand the activity of the system from the block, file, or object level all the way down to an individual flash cell.    
    • Maximize efficiency by laying out data in ways optimized for the media, avoiding write amplification and increasing endurance.
    • Avoid duplicate work by centralizing functions like garbage collection, sparing, and wear leveling.
  • Reduction of overall media costs by eliminating duplicate efforts and processes that happen across every drive in a traditional system. Petabyte-scale systems that leverage SSDs can have terabytes of DRAM in the drives themselves—not even including system memory—to maintain their individual FTL mappings and metadata. Each drive also contains its own overprovisioned spare space that’s necessary for media management by the FTL. Each one of these components is an added cost that as drive sizes increase will make up a larger and larger portion of the overall media cost. The cost-per-bit of DRAM hasn’t improved in the last several years, so efficient use of DRAM will become more and more critical.
  • Increased module reliability by failing at a much lower rate (3-4x) compared to SSDs, primarily due to the simpler firmware running.

How Solid-state Drives Work

An SSD is composed of NAND flash chips, also known as NAND flash dies, with each die being broken down into smaller elements called blocks, which are made up of pages.

However, flash blocks don’t support random overwrites. Once a page is written with data, the entire block needs to be erased before new data can be written in. At the same time, every SSD is built to support a backwards-compatible disk sector interface.

This contradiction is resolved by having something in firmware known as a “flash translation layer,” or FTL, which implements a virtual disk sector interface that allows you to write data to different flash pages no matter which logical block the data was intended for. The FTL keeps track of all this mapping metadata in its own memory and metadata storage.

But, because you’re now writing new versions of data into different flash pages, eventually you accumulate data in those blocks that could be considered “garbage” because the data has either been overwritten or logically deleted.

To reclaim this physical capacity, a “garbage collector” process in the drive firmware takes the data that is still valid and moves it to a new location, so that it can then erase the entire block containing the “tombstoned” data. For this garbage collector to work, each drive needs extra flash memory, what’s known as “overprovisioned space,” and every garbage collection event consumes one of the finite number of flash program-erase cycles. The amount of physical writes to the drive that every logical write consumes is known as “write amplification.”

Overprovisioning and write amplification lead to premature wear and shortened life span of the SSD. There are also performance impacts from this design because every time one of these flash dies is doing garbage collection, reads or writes won’t be available from that die. Therefore, performance of the SSD fluctuates unpredictably as the garbage collector becomes more or less active.

What makes this even more challenging is that SSDs have no way to communicate this garbage collection activity to the system that’s accessing it. Rather, the SSD has to maintain the illusion that it’s just like a hard drive. As the number of bits per cell in NAND flash increases, these performance inconsistencies only get worse, as program/erase cycles take longer and longer, leading to longer periods of data inaccessibility.

How DirectFlash Works

DirectFlash takes a different approach to flash media management. Rather than deputizing every SSD to perform its own wear leveling, garbage collection, and overprovisioning, the Purity operating system performs these functions in software at the array level. This means each DirectFlash Module is simpler than a traditional solid-state disk, as it only has to provide access to media itself and handle low-level data and signaling tasks.

The benefits that this provides are numerous:

  • Instead of each SSD making decisions about data placement and media management in a vacuum, Purity knows about all ongoing and scheduled system tasks such as current IO activity, data reduction operations, pending garbage collection cycles, and overall array workload and health. This allows Purity to make much smarter placement and scheduling decisions than a single drive could do on its own.
  • By making smarter data placement decisions, data of similar expected life spans can be co-located on the same blocks to minimize instances where some data in blocks is “tombstoned,” while other pages are still valid. Purity knows if certain pages are all part of the same file or object or coming from the same host system, and so by grouping those pages together into similar blocks when that file or object is deleted, the entire block can be freed at once—without rewriting other live data and causing write amplification.
  • By performing no garbage collection and causing no write amplification, DirectFlash Modules outperform and outlast their commodity counterparts. Fewer writes means less wear and thus longer drive life spans. Fewer writes also means more IO cycles are available to service “real” client IO. And because Purity knows about current IO activity and has visibility into the entire system, it’s never surprised by one of these program/erase cycles blocking access to data. In the worst case, Purity can just reconstruct that data from parity rather than waiting for a program/erase cycle to finish. This significantly reduces the worst-case latency of our systems, even when using QLC flash.
  • Because we perform all these media management tasks in software, we can improve this software over time. All Pure Storage® systems connected to the internet securely phone-home telemetry data, and since we have deep insight into the health and activity of the underlying flash memory, we aggregate and analyze this data to improve how our software works in the real world. This means over time, our systems’ reliability and performance can improve with regular software updates.
  • And lastly, because we perform all these activities at the array level in software, our DirectFlash Modules don’t need complex controllers and large amounts of RAM to do all this work on their own. Thus, our modules are simpler and therefore more reliable, in addition to being more efficient. We can also scale the size of our drives with advances in NAND flash fabrication technology, without needing to increase drive complexity or cost.

What this means for customers is systems that have more performance, more consistently, and more reliability and longevity than other all-flash or hybrid systems designed around SSDs.

Pure was founded around the belief that the future of the data center was all flash—and we’ve built our DirectFlash technology around making this vision a reality. We believe the best way to build all-flash systems is to build the system from the ground up for all flash. That means eliminating the parts of the system designed around legacy interfaces and paradigms and letting the technology truly shine.

Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.