Mean time to detect, or MTTD, is the average time it takes a DevOps team to detect a problem, such as a software bug or hardware failure, within an organization.
MTTD is one of the key performance indicators of incident management. Obviously, the sooner an organization discovers a problem, the better. Incidents often can lead to system downtime, which on average can cost $5,600 per minute, according to Gartner.
Although MTTD isn't the only metric available to DevOps teams, it's one of the easiest to track and measure, and it’s an essential metric for any organization that wants to avoid problems like system outages.
To calculate MTTD:
For example, let’s say the 24x7 operations support team for a large auto parts manufacturer tracks weekly MTTD for the entire facility. During the week of February 7-11, 2022, there were four incidents. Using systems logs, the team determined the start time and detection time of each incident and recorded them in a table as follows:
The mean time to detect is calculated as:
(118 + 53 + 148 + 85)/4
MTTD = 101 minutes
The auto parts manufacturer could then use this number to compare MTTD from this particular week to other weeks or to the same week in the previous year. If they’d calculated MTTD for a certain team, they could use this result to gauge the team’s performance over time. Some companies choose to remove outliers from the table, and many will also tier incidents by severity to see if MTTD varies according to the seriousness of the problem.
Monitoring MTTD mainly involves keeping track of anything that qualifies as an event or an issue, which can vary greatly from organization to organization.
The primary tools you need to monitor MTTD include:
Logs: Logs are automatically produced and time-stamped documentations of events relevant to a particular computer system or software application. For example, a web server’s access log lists all the individual files that people request from a website, including HTML files and any other associated files that get transmitted. Another example is a database log, which records all activity in the database, including all changes to records.
Help desks: Held desks are centralized help centers for product users who need help with anything related to the product, especially IT issues. They can be physical or online call centers or ticket systems that operate through SaaS applications. Help desks have a knowledge base that keeps records of customer issues, including what the issue was, when it was identified, and how it was resolved.
Intrusion detection systems: An intrusion detection system (IDS) is a system that monitors network traffic for suspicious activity and produces alerts when such activity is discovered. The primary functions of an IDS are reporting and anomaly detection, but some intrusion detection systems can take action when they detect malicious activity, including blocking traffic sent from suspicious IP addresses.
What constitutes a “good” MTTD will vary greatly depending on the company, its product, the industry, and the particular threat or intrusion the company wants to prevent or intercept. Obviously, the best possible MTTD is zero, meaning you catch the threat actor before it even has a chance to cause damage.
A zero MTTD is, of course, very hard to achieve. According to Ponemon Institute, which provides the industry standard benchmark for MTTD, the average time to identify and contain a data breach was 280 days in 2020 and 279 days in 2019.
To figure out what a good MTTD is for your particular company, you should look not only at the overall average for all companies but also try to get information on how other companies in your sector do with MTTD. Also, you need to calculate what the cost of the average data breach is for your company and how much your company can afford to lose per breach without causing serious financial hardship to the company.
There are various steps you can take to lower MTTD:
Other things that can help organizations lower their MTTD include security orchestration, automation and response (SOAR) technologies, and incident response plans.
Any company with systems or networks that need to stay up and running and secure can benefit from regularly measuring MTTD.
MTTD should always be measured at the times when the occurrence of the incident would cause damage. For example, for a manufacturing facility that only operates at night, you would only want to be checking for incidents at night. It wouldn’t make sense to include daytime data.
MTTD reflects the amount of time it takes your team to discover a potential security incident. But, the next step after detection is response.
Mean time to respond, or MTTR, is the time it takes to control, remediate, and/or eradicate a threat once it’s been discovered.
퓨어스토리지 제품이나 인증 관련 질문이나 코멘트가 있으신가요? 저희가 도와드립니다.
라이브 데모를 예약하고 퓨어스토리지가 데이터를 어떻게 강력한 결과로 전환해주는지 직접 확인해 보세요.