Pure Knowledge
What Is MTBF?

What Is MTBF and How Do You Calculate It?

Mean time between failure, or MTBF, is the average time between repairable failures of a product or system. It’s a key metric for determining the frequency of system failures and providing an overview of system reliability.

MTBF can be used to determine how successful your team is at preventing or reducing potential incidents. The higher the time between failures, the more reliable the system is.

What Does MTBF Measure? Reliability vs. Availability

MTBF plays a role in tracking both the reliability and availability of a component or system.

Reliability is the probability that a system or component will perform as designed over a specific period without failure. MTBF is a basic measure of a system’s reliability—the higher the MTBF, the higher the reliability of the product. Using MTBF with other failure metrics and maintenance strategies makes it easier to predict asset failures, as teams can better determine how and when to implement preventative measures before a failure occurs.

Availability is the ability of a system or component to operate as designed when needed. MTBF combined with mean time to restore (MTTR) can determine the likelihood that a system will fail within a certain time frame. The availability of a system can be calculated by dividing the MTBF by the sum of MTTR and MTBF.

Availability = MTBF / (MTBF + MTTR)

How to Calculate MTBF: Step-by-Step Formula

MTBF is calculated by dividing the total operational time for a specific period by the number of failures during the same period. Here’s how it’s calculated:

To determine the total operational time of a system, you’ll need to monitor the system for a specific period of time.

The total operational time is the total time the system has been running without failure.
The total number of failures is the number of times the system has failed within the specified period.

As an example, let’s say that during a 24-hour time frame, a system experiences three hours of downtime that occur during three separate incidents.

Total uptime = (24 - 3) = 21 hours
Total number of incidents = 3
MTBF = total uptime / number of incidents
MTBF = 21/3 = 7 hours

How to Calculate MTBF from Failure Rate

As described above, MTBF can be calculated by dividing total uptime by the number of failures recorded. Failure rate, on the other hand, is the inverse of MTBF and is calculated by dividing the number of failures by the total uptime.

MTBF can be calculated from the failure rate as follows: MTBF = 1 / failure rate

For instance:

Failure rate = 25 failures / 1,000 hours of uptime
Failure rate = 0.025
MTBF = 1 / 0.025
MTBF = 40

What Is a Good MTBF?

Since the time between failures for a system or component can depend on factors such as configurations, operating conditions, age, and other external factors, there isn’t one “good” MTBF metric. Instead, MTBF should be calculated for your specific assets and will become more accurate as you collect more data on them.

What does a high MTBF mean?

Of course, while there may not be a universally accepted target MTBF, it’s still true that the higher the MTBF, the better. A high MTBF shows that your system or component is highly reliable and will have fewer problems over its lifetime—and having fewer incidents tends to translate to reduced downtime and lower costs.

What does a low MTBF mean?

A low MTBF means that your system is likely to fail more frequently and the reliability of your system needs to be reviewed. A good preventative maintenance plan and the implementation of tools to monitor MTBF and other failure metrics can help improve system reliability.

MTBF Calculation Examples

Next, let’s consider some examples of low, average, and high MTBF related to a production system operating over the course of 30 days.

Low MTBF

Let’s say the system goes down six times within 30 days (720 hours) for four hours each time, for a total disruption time of 24 hours.

Total uptime = (720 - 24) = 696 hours
Total number of incidents = 6
MTBF = total uptime / number of incidents
MTBF = 696 / 6 = 116 hours (approximately 5 days)

An outage every five days indicates an extremely unreliable system that will frequently impact business operations and customers.

Average MTBF

Now, imagine that the system only goes down two times within the same 30 days (720 hours) for two hours each time, for a total disruption time of four hours.

Total uptime = (720 - 4) = 716 hours
Total number of incidents = 2
MTBF = total uptime / number of incidents
MTBF = 716 / 2 = 358 hours (approximately 15 days)

While this might not be an extremely high MTBF, one failure every 15 days can be acceptable for some business use cases.

High MTBF

Finally, consider a system that only goes down once within 30 days (720 hours) for two hours.

Total uptime = (720 - 2) = 718 hours
Total number of incidents = 1
MTBF = total uptime / number of incidents
MTBF = 718 / 1 = 718 hours (approximately 30 days)

Compared to the other scenarios described here, one failure every 30 days can be considered a high MTBF, indicating that the system is highly reliable.

How to Calculate MTBF: Three Scenarios

MTBF is a useful reliability metric in several areas of technology. Let’s consider some scenarios for cybersecurity, incident response, and DevOps.

Calculating MTBF in Cybersecurity

In cybersecurity, MTBF can indicate that a system is nearing the end of its life and that the risk of a critical outage is increasing.

For example, imagine that a cybersecurity system is observed over a 48-hour period. During that time, the system fails five times for a total downtime of eight hours or a total operational time of 40 hours.

MTBF = 40 / 5 = 8 hours

The following month, the system is again observed over 48 hours. This time, there are eight failures for a total downtime of 12 hours or a total operational time of 36 hours. The system’s MTBF is now 4.5 hours.

MTBF = 36 / 8 = 4.5 hours

If MTBF continues to fall during subsequent observations, this could suggest that an area in the system—or the entire system itself—needs to be replaced or hardened.

Calculating MTBF in Incident Response

MTBF can also help determine how effective your incident response team is at minimizing and preventing incidents. If MTBF is too low or trending downward, the team should analyze incident data to discover recurring outages and concerning trends.

Calculating MTBF in DevOps

MTBF in DevOps is a measure of the frequency of failures for a feature or single component, allowing teams to predict the reliability and availability levels of a service. In this way, it can highlight weaknesses in a component’s design or the testing and maintenance process.

By monitoring MTBF, DevOps teams can discover and eliminate inefficiencies and bottlenecks that could lead to failure by improving processes and system infrastructure. As teams make improvements, MTBF increases, indicating a more reliable system.

For instance, consider an example where the total work for a code integration pipeline over five days was 100 hours. During the week, four failures occur.

Total operation time = 100 hours
Total number of failures = 4
MTBF = total operation time / number of failures
MTBF = 100 / 4 = 25 hours

What Tools Do You Need to Monitor MTBF?

With the right tools, you can boost MTBF and other maintenance metrics. These tools include infrastructure monitoring tools, service monitoring, visualization tools, application performance monitoring tools, cross-platform and data aggregation tools, and project management tools.

Yet, all these tools require fast high-performance storage that can handle massive amounts of data while maintaining maximum performance. With Pure Storage® FlashBlade®, you can create a robust, high-performance storage solution to support the advanced monitoring and observability tools needed to help you boost your MTBF metrics.

What Is the Next Metric after MTBF?

MTBF and mean time to failure (MTTF) are both used to measure time to evaluate the performance of a system or component, though the way they’re applied is different.

Learn more about MTTF.

Browse key resources and events

PURE//ACCELERATE® 2024

Experience Pure//Accelerate

Get inspired, learn from innovators, and level up your skills for data success.

See What’s Happening

See All Events

PURE//ACCELERATE ROADSHOWS

An Event Is Coming Near You

Join us for a Pure//Accelerate event and discover storage solutions for the next generation and beyond.

See All Events

RESOURCE

The Future of Storage: New Principles for the AI Age

Learn how new challenges like AI are transforming data storage needs, requiring new thinking and a modern approach to succeed.

Get the Ebook

See All Resources

RESOURCE

Stop Buying Storage, Embrace Platforms Instead

Explore the needs, components, and selection process for enterprise storage platforms.

Read the Report

See All Resources

Meet with an Expert

Let’s talk. Book a 1:1 meeting with one of our experts to discuss your specific needs.

Schedule a Meeting

Questions, Comments?

Have a question or comment about Pure products or certifications? We’re here to help.

Schedule a Demo

Schedule a live demo and see for yourself how Pure can help transform your data into powerful outcomes.

Request a Demo

Call Sales: 800-976-6494

Media: pr@purestorage.com

Pure Storage, Inc.

2555 Augustine Dr.

Santa Clara, CA 95054

800-379-7873 (general info)

info@purestorage.com

Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.