Blog
/
Proactive Security
/
June 25, 2024

Let the Dominos Fall! SOC and IR Metrics for ROI

Vendors are scrambling to compare MTTD metrics laid out in the latest MITRE Engenuity ATT&CK® Evaluations. But this analysis is reductive, ignoring the fact that in cybersecurity, there are far more metrics that matter.
Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
John Bradshaw
Sr. Director, Technical Marketing
Default blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog image
25
Jun 2024

One of the most enjoyable discussions (and debates) I engage in is the topic of Security Operations Center (SOC) and Incident Response (IR) metrics to measure and validate an organization’s Return on Investment (ROI). The debate part comes in when I hear vendor experts talking about “the only” SOC metrics that matter, and only list the two most well-known, while completely ignoring metrics that have a direct causal relationship.

In this blog, I will discuss what I believe are the SOC/IR metrics that matter, how each one has a direct impact on the others, and why organizations should ensure they are working towards the goal of why these metrics are measured in the first place: Reduction of Risk and Costs.

Reduction of Risk and Costs

Every security solution and process an organization puts in place should reduce the organization’s risk of a breach, exposure by an insider threat, or loss of productivity. How an organization realizes net benefits can be in several ways:

  • Improved efficiencies can result in SOC/IR staff focusing on other areas such as advanced threat hunting rather than churning through alerts on their security consoles. It may also help organizations dealing with the lack of skilled security staff by using Artificial Intelligence (AI) and automated processes.
  • A well-oiled SOC/IR team that has greatly reduced or even eliminated mundane tasks attracts, motivates, and retains talent resulting in reduced hiring and training costs.
  • The direct impact of a breach such as a ransomware attack can be devastating. According to the 2024 Data Breach Investigations Report by Verizon, MGM Resorts International reported the ALPHV ransomware cost the company approximately $100 million[1].
  • Failure to take appropriate steps to protect the organization can result in regulatory fines; and if an organization has, or is considering, purchasing Cyber Insurance, can result in declined coverage or increased premiums.

How does an organization demonstrate they are taking proactive measures to prevent breaches? That is where it's important to understand the nine (yes, nine) key metrics, and how each one directly influences the others, play their roles.

Metrics in the Incident Response Timeline

Let’s start with a review of the key steps in the Incident Response Timeline:

Seven of the nine key metrics are in the IR timeline, while two of the metrics occur before you ever have an incident. They occur in the Pre-Detection Stage.

Pre-Detection stage metrics are:

  • Preventions Per Intrusion Attempt (PPIA)
  • False Positive Reduction Rate (FPRR)

Next is the Detect and Investigate stage, there are three metrics to consider:

  • Mean Time to Detection (MTTD)
  • Mean Time to Triage (MTTT)
  • Mean Time to Understanding (MTTU)

This is followed by the Remediation stage, there are two metrics here:

  • Mean Time to Containment (MTTC)
  • Mean Time to Remediation / Recovery (MTTR)

Finally, there is the Risk Reduction stage, there are two metrics:

  • Mean Time to Advice (MTTA)
  • Mean Time to Implementation (MTTI)

Pre-Detection Stage

Preventions Per Intrusion Attempt

PPIA is defined as stopping any intrusion attempt at the earliest possible stage. Your network Intrusion Prevention System (IPS) blocks vulnerability exploits, your e-mail security solution intercepts and removes messages with malicious attachments or links, your egress firewall blocks unauthorized login attempts, etc. The adversary doesn’t get beyond Step 1 in the attack life cycle.

This metric is the first domino. Every organization should strive to improve on this metric every day. Why? For every intrusion attempt you stop right out of the gate, you eliminate the actions for every other metric. There is no incident to detect, triage, investigate, remediate, or analyze post-incident for ways to improve your security posture.

When I think about PPIA, I always remember back to a discussion with a former mentor, Tim Crothers, who discussed the benefits of focusing on Prevention Failure Detection.

The concept is that as you layer your security defenses, your PPIA moves ever closer to 100% (no one has ever reached 100%). This narrows the field of fire for adversaries to breach into your organization. This is where novel, unknown, and permuted threats live and breathe. This is where solutions utilizing Unsupervised Machine Learning excel in raising anomalous alerts – indications of potential compromise involving one of these threats. Unsupervised ML also raises alerts on anomalous activity generated by known threats and can raise detections before many signature-based solutions. Most organizations struggle to find strong permutations of known threats, insider threats, supply chain attacks, attacks utilizing n-day and 0-day exploits. Moving PPIA ever closer to 100% also frees your team up for conducting threat hunting activities – utilizing components of your SOC that collect and store telemetry to query for potential compromises based on hypothesis the team raises. It also significantly reduces the alerts your team must triage and investigate – solving many of the issues outlined at the start of this paper.

False Positive Reduction Rate

Before we discuss FPRR, I should clarify how I define False Positives (FPs). Many define FPs as an alert that is in error (i.e.: your EDR alerts on malware that turns out to be AV signature files). While that is a FP, I extend the definition to include any alert that did not require triage / investigation and distracts the SOC/IR team (meaning they conducted some level of triage / investigation).

This metric is the second domino. Why is this metric important? Every alert your team exerts time and effort on that is a non-issue distracts them from alerts that matter. One of the major issues that has resonated in the security industry for decades is that SOCs are inundated with alerts and cannot clear the backlog. When it comes to PPIA + FPRR, I have seen analysts spend time investigating alerts that were blocked out of the gate while their screen continued to fill up with more. You must focus on Prevention Failure Detection to get ahead of the backlog.

Detect and Investigate Stages

Mean Time to Detection

MTTD, or “Dwell Time”, has decreased dramatically over the past 12 years. From well over a year to 16 days in 2023[2]. MTTD is measured from the earliest possible point you could detect the intrusion to the moment you actually detect it.

This third domino is important because the longer an adversary remains undetected, the more the odds increase they will complete their mission objective. It also makes the tasks of triage and investigation more difficult as analysts must piece together more activity and adversaries may be erasing evidence along the way – or your storage retention does not cover the breach timeline.

Many solutions focusing solely on MTTD can actually create the very problem SOCs are looking to solve.  That is, they generate so much alerting that they flood the console, email, or text messaging app causing an unmanageable queue of alerts (this is the problem XDR solutions were designed to resolve by focusing on incidents rather than alerts).

Mean Time to Triage

MTTT involves SOCs that utilize Level 1 (aka Triage) analysts to render an “escalate / do not escalate” alert verdict accurately. Accuracy is important because Triage Analysts typically are staff new to cyber security (recent grad / certification) and may over escalate (afraid to miss something important) or under escalate (not recognize signs of a successful breach). Because of this, a small MTTT does not always equate to successful handling of incidents.

This metric is important because keeping your senior staff focused on progressing incidents in a timely manner (and not expending time on false positives) should reduce stress and required headcount.

Mean Time to Understanding

MTTU deals with understanding the complete nature of the incident being investigated. This is different than MTTT which only deals with whether the issue merits escalation to senior analysts. It is then up to the senior analysts to determine the scope of the incident, and if you are a follower of my UPSET Investigation Framework, you know understanding the full scope involves:

U = All compromised accounts

P = Persistence Mechanisms used

S = All systems involved (organization, adversary, and intermediaries)

E = Endgame (or mission objective)

T = Techniques, Tactics, Procedures (TTPs) utilized by the adversary

MTTU is important because this information is critical before any containment or remediation actions are taken. Leave a stone unturned, and you alert the adversary that you are onto them and possibly fail to close an avenue of access.

Remediation Stages

Mean Time to Containment

MTTC deals with neutralizing the threat. You may not have kicked the adversary out, but you have halted their progress to their mission objective and ability to inflict further damage. This may be through use of isolation capabilities, termination of malicious processes, or firewall blocks.

MTTC is important, especially with ransomware attacks where every second counts. Faster containment responses can result in reduced / eliminated disruption to business operations or loss of data.

Mean Time to Remediation / Recovery

The full scope of the incident is understood, the adversary has been halted in their tracks, no malicious processes are running on any systems in your organization. Now is the time to put things back to right. MTTR deals with the time involved in restoring business operations to pre-incident stage. It means all remnants of changes made by the adversary (persistence, account alterations, programs installed, etc.) are removed; all disrupted systems are restored to operations (i.e.: ransomware encrypted systems are recovered from backups / snapshots), compromised user accounts are reset, etc.

MTTR is important because it informs senior management of how fast the organization can recover from an incident. Disaster Recovery and Business Continuity plans play a major role in improving this score.

Risk Reduction Stages

Mean Time to Advice

After the dust has settled from the incident, the job is not done. MTTA deals with identifying and assessing the specific areas (vulnerabilities, misconfigurations, lack of security controls) that permitted the adversary to advance to the point where detection occurred (and any actions beyond). The SOC and IR teams should then compile a list of recommendations to present to management to improve the security posture of the organization so the same attack path cannot be used.

Mean Time to Implement

Once recommendations are delivered to management, how long does it take to implement them? MTTI tracks this timeline because none of it matters if you don’t fix the holes that led to the breach.

Nine Dominos

There are the nine dominos of SOC / IR metrics I recommend helping organizations know if they are on the right track to reduce risk, costs and improve morale / retention of the security teams. You may not wish to track all nine, but understanding how each metric impacts the others can provide visibility into why you are not seeing expected improvements when you implement a new security solution or change processes.

Improving prevention and reducing false positives can make huge positive impacts on your incident response timeline. Utilizing solutions that get you to resolution quicker allows the team to focus on recommendations and risk reduction strategies.

Whichever metrics you choose to track, just be sure the dominos fall in your favor.

References

[1] 2024 Verizon Data Breach Investigations Report, p83

[2] Mandiant M-Trends 2023

Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
John Bradshaw
Sr. Director, Technical Marketing

More in this series

No items found.

Blog

/

AI

/

July 16, 2025

Introducing the AI Maturity Model for Cybersecurity

AI maturity model for cybersecurityDefault blog imageDefault blog image

AI adoption in cybersecurity: Beyond the hype

Security operations today face a paradox. On one hand, artificial intelligence (AI) promises sweeping transformation from automating routine tasks to augmenting threat detection and response. On the other hand, security leaders are under immense pressure to separate meaningful innovation from vendor hype.

To help CISOs and security teams navigate this landscape, we’ve developed the most in-depth and actionable AI Maturity Model in the industry. Built in collaboration with AI and cybersecurity experts, this framework provides a structured path to understanding, measuring, and advancing AI adoption across the security lifecycle.

Overview of AI maturity levels in cybersecurity

Why a maturity model? And why now?

In our conversations and research with security leaders, a recurring theme has emerged:

There’s no shortage of AI solutions, but there is a shortage of clarity and understanding of AI uses cases.

In fact, Gartner estimates that “by 2027, over 40% of Agentic AI projects will be canceled due to escalating costs, unclear business value, or inadequate risk controls. Teams are experimenting, but many aren’t seeing meaningful outcomes. The need for a standardized way to evaluate progress and make informed investments has never been greater.

That’s why we created the AI Security Maturity Model, a strategic framework that:

  • Defines five clear levels of AI maturity, from manual processes (L0) to full AI Delegation (L4)
  • Delineating the outcomes derived between Agentic GenAI and Specialized AI Agent Systems
  • Applies across core functions such as risk management, threat detection, alert triage, and incident response
  • Links AI maturity to real-world outcomes like reduced risk, improved efficiency, and scalable operations

[related-resource]

How is maturity assessed in this model?

The AI Maturity Model for Cybersecurity is grounded in operational insights from nearly 10,000 global deployments of Darktrace's Self-Learning AI and Cyber AI Analyst. Rather than relying on abstract theory or vendor benchmarks, the model reflects what security teams are actually doing, where AI is being adopted, how it's being used, and what outcomes it’s delivering.

This real-world foundation allows the model to offer a practical, experience-based view of AI maturity. It helps teams assess their current state and identify realistic next steps based on how organizations like theirs are evolving.

Why Darktrace?

AI has been central to Darktrace’s mission since its inception in 2013, not just as a feature, but the foundation. With over a decade of experience building and deploying AI in real-world security environments, we’ve learned where it works, where it doesn’t, and how to get the most value from it. This model reflects that insight, helping security leaders find the right path forward for their people, processes, and tools

Security teams today are asking big, important questions:

  • What should we actually use AI for?
  • How are other teams using it — and what’s working?
  • What are vendors offering, and what’s just hype?
  • Will AI ever replace people in the SOC?

These questions are valid, and they’re not always easy to answer. That’s why we created this model: to help security leaders move past buzzwords and build a clear, realistic plan for applying AI across the SOC.

The structure: From experimentation to autonomy

The model outlines five levels of maturity :

L0 – Manual Operations: Processes are mostly manual with limited automation of some tasks.

L1 – Automation Rules: Manually maintained or externally-sourced automation rules and logic are used wherever possible.

L2 – AI Assistance: AI assists research but is not trusted to make good decisions. This includes GenAI agents requiring manual oversight for errors.

L3 – AI Collaboration: Specialized cybersecurity AI agent systems  with business technology context are trusted with specific tasks and decisions. GenAI has limited uses where errors are acceptable.

L4 – AI Delegation: Specialized AI agent systems with far wider business operations and impact context perform most cybersecurity tasks and decisions independently, with only high-level oversight needed.

Each level reflects a shift, not only in technology, but in people and processes. As AI matures, analysts evolve from executors to strategic overseers.

Strategic benefits for security leaders

The maturity model isn’t just about technology adoption it’s about aligning AI investments with measurable operational outcomes. Here’s what it enables:

SOC fatigue is real, and AI can help

Most teams still struggle with alert volume, investigation delays, and reactive processes. AI adoption is inconsistent and often siloed. When integrated well, AI can make a meaningful difference in making security teams more effective

GenAI is error prone, requiring strong human oversight

While there is a lot of hype around GenAI agentic systems, teams will need to account for inaccuracy and hallucination in Agentic GenAI systems.

AI’s real value lies in progression

The biggest gains don’t come from isolated use cases, but from integrating AI across the lifecycle, from preparation through detection to containment and recovery.

Trust and oversight are key initially but evolves in later levels

Early-stage adoption keeps humans fully in control. By L3 and L4, AI systems act independently within defined bounds, freeing humans for strategic oversight.

People’s roles shift meaningfully

As AI matures, analyst roles consolidate and elevate from labor intensive task execution to high-value decision-making, focusing on critical, high business impact activities, improving processes and AI governance.

Outcome, not hype, defines maturity

AI maturity isn’t about tech presence, it’s about measurable impact on risk reduction, response time, and operational resilience.

[related-resource]

Outcomes across the AI Security Maturity Model

The Security Organization experiences an evolution of cybersecurity outcomes as teams progress from manual operations to AI delegation. Each level represents a step-change in efficiency, accuracy, and strategic value.

L0 – Manual Operations

At this stage, analysts manually handle triage, investigation, patching, and reporting manually using basic, non-automated tools. The result is reactive, labor-intensive operations where most alerts go uninvestigated and risk management remains inconsistent.

L1 – Automation Rules

At this stage, analysts manage rule-based automation tools like SOAR and XDR, which offer some efficiency gains but still require constant tuning. Operations remain constrained by human bandwidth and predefined workflows.

L2 – AI Assistance

At this stage, AI assists with research, summarization, and triage, reducing analyst workload but requiring close oversight due to potential errors. Detection improves, but trust in autonomous decision-making remains limited.

L3 – AI Collaboration

At this stage, AI performs full investigations and recommends actions, while analysts focus on high-risk decisions and refining detection strategies. Purpose-built agentic AI systems with business context are trusted with specific tasks, improving precision and prioritization.

L4 – AI Delegation

At this stage, Specialized AI Agent Systems performs most security tasks independently at machine speed, while human teams provide high-level strategic oversight. This means the highest time and effort commitment activities by the human security team is focused on proactive activities while AI handles routine cybersecurity tasks

Specialized AI Agent Systems operate with deep business context including impact context to drive fast, effective decisions.

Join the webinar

Get a look at the minds shaping this model by joining our upcoming webinar using this link. We’ll walk through real use cases, share lessons learned from the field, and show how security teams are navigating the path to operational AI safely, strategically, and successfully.

Continue reading
About the author
Ashanka Iddya
Senior Director, Product Marketing

Blog

/

Cloud

/

July 16, 2025

Forensics or Fauxrensics: Five Core Capabilities for Cloud Forensics and Incident Response

people working and walking in officeDefault blog imageDefault blog image

The speed and scale at which new cloud resources can be spun up has resulted in uncontrolled deployments, misconfigurations, and security risks. It has had security teams racing to secure their business’ rapid migration from traditional on-premises environments to the cloud.

While many organizations have successfully extended their prevention and detection capabilities to the cloud, they are now experiencing another major gap: forensics and incident response.

Once something bad has been identified, understanding its true scope and impact is nearly impossible at times. The proliferation of cloud resources across a multitude of cloud providers, and the addition of container and serverless capabilities all add to the complexities. It’s clear that organizations need a better way to manage cloud incident response.

Security teams are looking to move past their homegrown solutions and open-source tools to incorporate real cloud forensics capabilities. However, with the increased buzz around cloud forensics, it can be challenging to decipher what is real cloud forensics, and what is “fauxrensics.”

This blog covers the five core capabilities that security teams should consider when evaluating a cloud forensics and incident response solution.

[related-resource]

1. Depth of data

There have been many conversations among the security community about whether cloud forensics is just log analysis. The reality, however, is that cloud forensics necessitates access to a robust dataset that extends far beyond traditional log data sources.

While logs provide valuable insights, a forensics investigation demands a deeper understanding derived from multiple data sources, including disk, network, and memory, within the cloud infrastructure. Full disk analysis complements log analysis, offering crucial context for identifying the root cause and scope of an incident.

For instance, when investigating an incident involving a Kubernetes cluster running on an EC2 instance, access to bash history can provide insights into the commands executed by attackers on the affected instance, which would not be available through cloud logs alone.

Having all of the evidence in one place is also a capability that can significantly streamline investigations, unifying your evidence be it disk images, memory captures or cloud logs, into a single timeline allowing security teams to reconstruct an attacks origin, path and impact far more easily. Multi–cloud environments also require platforms that can support aggregating data from many providers and services into one place. Doing this enables more holistic investigations and reduces security blind spots.

There is also the importance of collecting data from ephemeral resources in modern cloud and containerized environments. Critical evidence can be lost in seconds as resources are constantly spinning up and down, so having the ability to capture this data before its gone can be a huge advantage to security teams, rather than having to figure out what happened after the affected service is long gone.

darktrace / cloud, cado, cloud logs, ost, and memory information. value of cloud combined analysis

2. Chain of custody

Chain of custody is extremely critical in the context of legal proceedings and is an essential component of forensics and incident response. However, chain of custody in the cloud can be extremely complex with the number of people who have access and the rise of multi-cloud environments.

In the cloud, maintaining a reliable chain of custody becomes even more complex than it already is, due to having to account for multiple access points, service providers and third parties. Having automated evidence tracking is a must. It means that all actions are logged, from collection to storage to access. Automation also minimizes the chance of human error, reducing the risk of mistakes or gaps in evidence handling, especially in high pressure fast moving investigations.

The ability to preserve unaltered copies of forensic evidence in a secure manner is required to ensure integrity throughout an investigation. It is not just a technical concern, its a legal one, ensuring that your evidence handling is documented and time stamped allows it to stand up to court or regulatory review.

Real cloud forensics platforms should autonomously handle chain of custody in the background, recording and safeguarding evidence without human intervention.

3. Automated collection and isolation

When malicious activity is detected, the speed at which security teams can determine root cause and scope is essential to reducing Mean Time to Response (MTTR).

Automated forensic data collection and system isolation ensures that evidence is collected and compromised resources are isolated at the first sign of malicious activity. This can often be before an attacker has had the change to move latterly or cover their tracks. This enables security teams to prevent potential damage and spread while a deeper-dive forensics investigation takes place. This method also ensures critical incident evidence residing in ephemeral environments is preserved in the event it is needed for an investigation. This evidence may only exist for minutes, leaving no time for a human analyst to capture it.

Cloud forensics and incident response platforms should offer the ability to natively integrate with incident detection and alerting systems and/or built-in product automation rules to trigger evidence capture and resource isolation.

4. Ease of use

Security teams shouldn’t require deep cloud or incident response knowledge to perform forensic investigations of cloud resources. They already have enough on their plates.

While traditional forensics tools and approaches have made investigation and response extremely tedious and complex, modern forensics platforms prioritize usability at their core, and leverage automation to drastically simplify the end-to-end incident response process, even when an incident spans multiple Cloud Service Providers (CSPs).

Useability is a core requirement for any modern forensics platform. Security teams should not need to have indepth knowledge of every system and resource in a given estate. Workflows, automation and guidance should make it possible for an analyst to investigate whatever resource they need to.

Unifying the workflow across multiple clouds can also save security teams a huge amount of time and resources. Investigations can often span multiple CSP’s. A good security platform should provide a single place to search, correlate and analyze evidence across all environments.

Offering features such as cross cloud support, data enrichment, a single timeline view, saved search, and faceted search can help advanced analysts achieve greater efficiency, and novice analysts are able to participate in more complex investigations.

5. Incident preparedness

Incident response shouldn't just be reactive. Modern security teams need to regularly test their ability to acquire new evidence, triage assets and respond to threats across both new and existing resources, ensuring readiness even in the rapidly changing environments of the cloud.  Having the ability to continuously assess your incident response and forensics workflows enables you to rapidly improve your processes and identify and mitigate any gaps identified that could prevent the organization from being able to effectively respond to potential threats.

Real forensics platforms deliver features that enable security teams to prepare extensively and understand their shortcomings before they are in the heat of an incident. For example, cloud forensics platforms can provide the ability to:

  • Run readiness checks and see readiness trends over time
  • Identify and mitigate issues that could prevent rapid investigation and response
  • Ensure the correct logging, management agents, and other cloud-native tools are appropriately configured and operational
  • Ensure that data gathered during an investigation can be decrypted
  • Verify that permissions are aligned with best practices and are capable of supporting incident response efforts

Cloud forensics with Darktrace

Darktrace delivers a proactive approach to cyber resilience in a single cybersecurity platform, including cloud coverage. Darktrace / CLOUD is a real time Cloud Detection and Response (CDR) solution built with advanced AI to make cloud security accessible to all security teams and SOCs. By using multiple machine learning techniques, Darktrace brings unprecedented visibility, threat detection, investigation, and incident response to hybrid and multi-cloud environments.

Darktrace’s cloud offerings have been bolstered with the acquisition of Cado Security Ltd., which enables security teams to gain immediate access to forensic-level data in multi-cloud, container, serverless, SaaS, and on-premises environments.

[related-resource]

Continue reading
About the author
Calum Hall
Technical Content Researcher
Your data. Our AI.
Elevate your network security with Darktrace AI