Blog
/
/
March 4, 2019

The VR Goldilocks Problem and Value of Continued Recognition

Security and Operations Teams face challenges when it comes to visibility and recognition. Learn more about how we find a solution to the problems!
Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
Max Heinemeyer
Global Field CISO
Default blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog image
04
Mar 2019

First, some context about VR

Security Operations teams face two fundamental challenges when it comes to 'finding bad'.

The first is gaining and maintaining appropriate visibility into what is happening in our environments. Visibility is provided through data (e.g. telemetry, logs). The trinity of data sources for visibility concern accounts/credentials, devices, and network traffic.

The second challenge is getting good recognition within the scope of what is visible. Recognition is fundamentally about what alerting and workflows you can implement and automate in response to activity that is suspicious or malicious.

Visibility and Recognition each have their own different associated issues.

Visibility is a problem about what is and can be generated and either read as telemetry, or logged and stored locally, or shipped to a central platform. The timelines and completeness of what visibility you have can depend on factors such as how much data you can or can't store locally on devices that generate data - and for how long; what your data pipeline and data platform look like (e.g. if you are trying to centralise data for analysis); or the capability of host software agents you have to process certain information locally.

The constraints on visibility sets the bar for factors like coverage, timelines and completeness of what recognition you can achieve. Without visibility, we cannot recognize at all. With limited visibility, what we can recognize may not have much value. With the right visibility, we can still fail to recognise the right things. And with too much recognition, we can quickly overload our senses.

A good example of a technology that offers the opportunity to solve these challenges at the network layer is Darktrace. Their technology provides visibility, from a network traffic perspective, into data that concerns devices and the accounts/credentials associated with them. They then provide recognition on top of this by using Machine Learning (ML) models for anomaly detection. Their models alert on a wide range of activities that may be indicative of threat activity, (e.g. malware execution and command and control, a technical exploit, data exfiltration and so on).

The major advantage they provide, compared to traditional Intrusion Detection Systems (IDS) and other vendors who also use ML for network anomaly detection, is that you can a) adjust the sensitivity of their algorithms and b) build your own recognition for particular patterns of interest. For example, if you want to monitor what connections are made to one or two servers, you can set up alerts for any change to expected patterns. This means you can create and adjust custom recognition based on your enterprise context and tune it easily in response to how context changes over time.

The Goldilocks VR Matrix

Below is what we call the VR Goldilocks Matrix at PBX Group Security. We use it to assess technology, measure our own capability and processes, and ask ourselves hard questions about where we need to focus to get the most value from our budget, (or make cuts / shift investment) if we need to.

In the squares are some examples of what you (maybe) should think about doing if you find yourself there.

Important questions to ask about VR

One of the things about Visibility and Recognition is that it’s not a given they are ‘always on’. Sometimes there are failure modes for visibility (causing a downstream issue with recognition). And sometimes there are failure modes or conditions under which you WANT to pause recognition.

The key questions you must have answers to about this include:

  • Under what conditions might I lose visibility?
  • How would I know if I have?
  • Is that loss a blind spot (i.e. data is lost for a given time period)…
  • …or is it 'a temporal delay’ (e.g. a connection fails and data is batched for moving from A to B but that doesn’t happen for a few hours)?
  • What are the recognitions that might be impacted by either of the above?
  • What is my expectation for the SLA on those recognitions from ‘cause of alert’ to ‘response workflow’?
  • Under what conditions would I be willing to pause recognition, change the workflow for what happens upon recognition, or stop it all together?
  • What is the stacked ranked list of ‘must, should, could’ for all recognition and why?

Alerts. Alerts everywhere.

More often than not, Security Operations teams suffer the costs of wasted time due to noisy alerts from certain data sources. As a consequence, it's more difficult for them to single out malicious behavior as suspicious or benign. The number of alerts that are generated due to out of the box SIEM platform configurations for sources like Web Proxies and Domain Controllers are often excessive, and the cost to tune those rules can also be unpalatable. Therefore, rather than trying to tune alerts, teams might make a call to switch them off until someone can get around to figuring out a better way. There’s no use having hypothetical recognition, but no workflow to act on what is generate (other than compliance).

This is where technologies that use ML can help. There are two basic approaches...

One is to avoid alerting until multiple conditions are met that indicate a high probability of threat activity. In this scenario, rather than alerting on the 1st, 2nd, 3rd and 4th ‘suspicious activities’, you wait until you have a critical mass of indicators, and then you generate one high fidelity alert that has a much greater weighting to be malicious. This requires both a high level of precision and accuracy in alerting, and naturally some trade off in the time that can pass before an alert for malicious activity is generated.

The other is to alert on ‘suspicious actives 1-4' and let an analyst or automated process decide if this merits further investigation. This approach sacrifices accuracy for precision, but provides rapid context on whether one, or multiple, conditions are met that push the machine(s) up the priority list in the triage queue. To solve for the lower level of accuracy, this approach can make decisions about how long to sustain alerting. For example, if a host triggers multiple anomaly detection models, rather than continue to send alerts (and risk the SOC deciding to turn them off), the technology can pause alerts after a certain threshold. If a machine has not been quarantined or taken off the network after 10 highly suspicious behaviors are flagged, there is a reasonable assumption that the analyst will have dug into these and found the activity is legitimate.

Punchline 1: the value of Continued Recognition even when 'not malicious'

The topic of paused detections was raised after a recent joint exercise between PBX Group Security and Darktrace in testing Darktrace’s recognition. After a machine being used by the PBX Red Team breached multiple high priority models on Darktrace, the technology stopped alerting on further activity. This was because the initial alerts would have been severe enough to trigger a SOC workflow. This approach is designed to solve the problem of alert overload on a machine that is behaving anomalously but is not in fact malicious. Rather than having the SOC turn off alerts for that machine (which could later be used maliciously), the alerts are paused.

One of the outcomes of the test was that the PBX Detect team advised they would still want those alerts to exist for context to see what else the machine does (i.e. to understand its pattern of life). Now, rather than pausing alerts, Darktrace is surfacing this to customers to show where a rule is being paused and create an option to continue seeing alerts for a machine that has breached multiple models.

Which leads us on to our next point…

Punchline 2: the need for Atomic Tests for detection

Both Darktrace and Photobox Security are big believers in Atomic Red Team testing, which involves ‘unit tests’ that repeatedly (or at a certain frequency) test a detection using code. Unit tests automate the work of Red Teams when they discovery control strengths (which you want to monitor continuously for uptime) or control gaps (which you want to monitor for when they are closed). You could design atomic tests to launch a series of particular attacks / threat actor actions from one machine in a chained event. Or you could launch different discreet actions from different machines, each of which has no prior context for doing bad stuff. This allows you to scale the sample size for testing what recognition you have (either through ML or more traditional SIEM alerting). Doing this also means you don't have to ask Red Teams to repeat the same tests again, allowing them to focus on different threat paths to achieve objectives.

Mitre Att&ck is an invaluable framework for this. Many vendors are now aligning to Att&ck to show what they can recognize relating to attack TTPs (Tools, Tactics and Procedures). This enables security teams to map what TTPs are relevant to them (e.g. by using threat intel about the campaigns of threat actor groups that are targeting them). Atomic Red Team tests can then be used to assure that expected detections are operational or find gaps that need closing.

If you miss detections, then you know you need to optimise the recognition you have. If you get too many recognitions outside of the atomic test conditions, you either have to accept a high false positive rate because of the nature of the network, or you can tune your detection sensitivity. The opportunities to do this with technology based on ML and anomaly detection are significant, because you can quickly see for new attack types what a unit test tells you about your current detections and that coverage you think you have is 'as expected'.

Punchline 3: collaboration for the win

Using well-structured Red Team exercises can help your organisation and your technology partners learn new things about how we can collectively find and halt evil. They can also help defenders learn more about good assumptions to build into ML models, as well as covering edge cases where alerts have 'business intelligence' value vs ‘finding bad’.

If you want to understand the categorisations of ways that your populations of machines act over time, there is no better way to do it than through anomaly detection and feeding alerts into a system that supports SOC operations as well as knowledge management (e.g. a graph database).

Working like this means that we also help get the most out of the visibility and recognition we have. Security solutions can be of huge help to Network and Operations teams for troubleshooting or answering questions about network architecture. Often, it’s just a shift in perspective that unlocks cross-functional value from investments in security tech and process. Understanding that recognition doesn’t stop with security is another great example of where technologies that let you build your own logic into recognition can make a huge difference above protecting the bottom line, to adding top line value.

Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
Max Heinemeyer
Global Field CISO

More in this series

No items found.

Blog

/

/

September 25, 2025

Announcing Unified Real-Time CDR and Automated Investigations to Transform Cloud Security Operations

Default blog imageDefault blog image

Fragmented Tools are Failing SOC Teams in the Cloud Era

The cloud has transformed how businesses operate, reshaping everything from infrastructure to application delivery. But cloud security has not kept pace. Most tools still rely on traditional models of logging, policy enforcement, and posture management; approaches that provide surface-level visibility but lack the depth to detect or investigate active attacks.

Meanwhile, attackers are exploiting vulnerabilities, delivering cloud-native exploits, and moving laterally in ways that posture management alone cannot catch fast enough. Critical evidence is often missed, and alerts lack the forensic depth SOC analysts need to separate noise from true risk. As a result, organizations remain exposed: research shows that nearly nine in ten organizations have suffered a critical cloud breach despite investing in existing security tools [1].

SOC teams are left buried in alerts without actionable context, while ephemeral workloads like containers and serverless functions vanish before evidence can be preserved. Point tools for logging or forensics only add complexity, with 82% of organizations using multiple platforms to investigate cloud incidents [2].

The result is a broken security model: posture tools surface risks but don’t connect them to active attacker behaviors, while investigation tools are too slow and fragmented to provide timely clarity. Security teams are left reactive, juggling multiple point solutions and still missing critical signals. What’s needed is a unified approach that combines real-time detection and response for active threats with automated investigation and cloud posture management in a single workflow.

Just as security teams once had to evolve beyond basic firewalls and antivirus into network and endpoint detection, response, and forensics, cloud security now requires its own next era: one that unifies detection, response, and investigation at the speed and scale of the cloud.

A Powerful Combination: Real-Time CDR + Automated Cloud Forensics

Darktrace / CLOUD now uniquely unites detection, investigation, and response into one workflow, powered by Self-Learning AI. This means every alert, from any tool in your stack, can instantly become actionable evidence and a complete investigation in minutes.

With this release, Darktrace / CLOUD delivers a more holistic approach to cloud defense, uniting real-time detection, response, and investigation with proactive risk reduction. The result is a single solution that helps security teams stay ahead of attackers while reducing complexity and blind spots.

  • Automated Cloud Forensic Investigations: Instantly capture and analyze volatile evidence from cloud assets, reducing investigation times from days to minutes and eliminating blind spots
  • Enhanced Cloud-Native Threat Detection: Detect advanced attacker behaviors such as lateral movement, privilege escalation, and command-and-control in real time
  • Enhanced Live Cloud Topology Mapping: Gain continuous insight into cloud environments, including ephemeral workloads, with live topology views that simplify investigations and expose anomalous activity
  • Agentless Scanning for Proactive Risk Reduction: Continuously monitor for misconfigurations, vulnerabilities, and risky exposures to reduce attack surface and stop threats before they escalate.

Automated Cloud Forensic Investigations

Darktrace / CLOUD now includes capabilities introduced with Darktrace / Forensic Acquisition & Investigation, triggering automated forensic acquisition the moment a threat is detected. This ensures ephemeral evidence, from disks and memory to containers and serverless workloads can be preserved instantly and analyzed in minutes, not days. The integration unites detection, response, and forensic investigation in a way that eliminates blind spots and reduces manual effort.

Figure 1: Easily view Forensic Investigation of a cloud resource within the Darktrace / CLOUD architecture map

Enhanced Cloud-Native Threat Detection

Darktrace / CLOUD strengthens its real-time behavioral detection to expose early attacker behaviors that logs alone cannot reveal. Enhanced cloud-native detection capabilities include:

• Reconnaissance & Discovery – Detects enumeration and probing activity post-compromise.

• Privilege Escalation via Role Assumption – Identifies suspicious attempts to gain elevated access.

• Malicious Compute Resource Usage – Flags threats such as crypto mining or spam operations.

These enhancements ensure active attacks are detected earlier, before adversaries can escalate or move laterally through cloud environments.

Figure 2: Cyber AI Analyst summary of anomalous behavior for privilege escalation and establishing persistence.

Enhanced Live Cloud Topology Mapping

New enhancements to live topology provide real-time mapping of cloud environments, attacker movement, and anomalous behavior. This dynamic visibility helps SOC teams quickly understand complex environments, trace attack paths, and prioritize response. By integrating with Darktrace / Proactive Exposure Management (PEM), these insights extend beyond the cloud, offering a unified view of risks across networks, endpoints, SaaS, and identity — giving teams the context needed to act with confidence.

Figure 3: Enhanced live topology maps unify visibility across architectures, identities, network connections and more.

Agentless Scanning for Proactive Risk Reduction

Darktrace / CLOUD now introduces agentless scanning to uncover malware and vulnerabilities in cloud assets without impacting performance. This lightweight, non-disruptive approach provides deep visibility into cloud workloads and surfaces risks before attackers can exploit them. By continuously monitoring for misconfigurations and exposures, the solution strengthens posture management and reduces attack surface across hybrid and multi-cloud environments.

Figure 4: Agentless scanning of cloud assets reveals vulnerabilities, which are prioritized by severity.

Together, these capabilities move cloud security operations from reactive to proactive, empowering security teams to detect novel threats in real time, reduce exposures before they are exploited, and accelerate investigations with forensic depth. The result is faster triage, shorter MTTR, and reduced business risk — all delivered in a single, AI-native solution built for hybrid and multi-cloud environments.

Accelerating the Evolution of Cloud Security

Cloud security has long been fragmented, forcing teams to stitch together posture tools, log-based monitoring, and external forensics to get even partial coverage. With this release, Darktrace / CLOUD delivers a holistic, unified approach that covers every stage of the cloud lifecycle, from proactive posture management and risk identification to real-time detection, to automated investigation and response.

By bringing these capabilities together in a single AI-native solution, Darktrace is advancing cloud security beyond incremental change and setting a new standard for how organizations protect their hybrid and multi-cloud environments.

With Darktrace / CLOUD, security teams finally gain end-to-end visibility, response, and investigation at the speed of the cloud, transforming cloud defense from fragmented and reactive to unified and proactive.

[related-resource]

Sources: [1], [2] Darktrace Report: Organizations Require a New Approach to Handle Investigations in the Cloud

Continue reading
About the author
Adam Stevens
Senior Director of Product, Cloud | Darktrace

Blog

/

/

September 25, 2025

Introducing the Industry’s First Truly Automated Cloud Forensics Solution

Default blog imageDefault blog image

Why Cloud Investigations Fail Today

Cloud investigations have become one of the hardest problems in modern cybersecurity. Traditional DFIR tools were built for static, on-prem environments, rather than dynamic and highly scalable cloud environments, containing ephemeral workloads that disappear in minutes. SOC analysts are flooded with cloud security alerts with one-third lacking actionable data to confirm or dismiss a threat[1], while DFIR teams waste 3-5 days requesting access and performing manual collection, or relying on external responders.

These delays leave organizations vulnerable. Research shows that nearly 90% of organizations suffer some level of damage before they can fully investigate and contain a cloud incident [2]. The result is a broken model: alerts are closed without a complete understanding of the threat due to a lack of visibility and control, investigations drag on, and attackers retain the upper hand.

For SOC teams, the challenge is scale and clarity. Analysts are inundated with alerts but lack the forensic depth to quickly distinguish real threats from noise. Manual triage wastes valuable time, creates alert fatigue, and often forces teams to escalate or dismiss incidents without confidence — leaving adversaries with room to maneuver.

For DFIR teams, the challenge is depth and speed. Traditional forensics tools were built for static, on-premises environments and cannot keep pace with ephemeral workloads that vanish in minutes. Investigators are left chasing snapshots, requesting access from cloud teams, or depending on external responders, leading to blind spots and delayed response.

That’s why we built Darktrace / Forensic Acquisition & Investigation, the first automated forensic solution designed specifically for the speed, scale, and complexities of the cloud. It addresses both sets of challenges by combining automated forensic evidence capture, attacker timeline reconstruction, and cross-cloud scale. The solution empowers SOC analysts with instant clarity and DFIR teams with forensic depth, all in minutes, not days. By leveraging the very nature of the cloud, Darktrace makes these advanced capabilities accessible to security teams of all sizes, regardless of expertise or resources.

Introducing Automated Forensics at the Speed and Scale of Cloud

Darktrace / Forensic Acquisition & Investigation transforms cloud investigations by capturing, processing, and analyzing forensic evidence of cloud workloads, instantly, even from time-restricted ephemeral resources. Triggered by a detection from any cloud security tool, the entire process is automated, providing accurate root cause analysis and deep insights into attacker behavior in minutes rather than days or weeks. SOC and DFIR teams no longer have to rely on manual processes, snapshots, or external responders, they can now leverage the scale and elasticity of the cloud to accelerate triage and investigations.

Seamless Integration with Existing Detection Tools

Darktrace / Forensic Acquisition & Investigation does not require customers to replace their detection stack. Instead, it integrates with cloud-native providers, XDR platforms, and SIEM/SOAR tools, automatically initiating forensic capture whenever an alert is raised. This means teams can continue leveraging their existing investments while gaining the forensic depth required to validate alerts, confirm root cause, and accelerate response.

Most importantly, the solution is natively integrated with Darktrace / CLOUD, turning real-time detections of novel attacker behaviors into full forensic investigations instantly. When Darktrace / CLOUD identifies suspicious activity such as lateral movement, privilege escalation, or abnormal usage of compute resources, Darktrace / Forensic Acquisition & Investigation automatically preserves the underlying forensic evidence before it disappears. This seamless workflow unites detection, response, and investigation in a way that eliminates gaps, accelerates triage, and gives teams confidence that every critical cloud alert can be investigated to completion.

Figure 1: Integration with Darktrace / CLOUD – this example is showing the ability to pivot into the forensic investigation associated with a compromised cloud asset

Automated Evidence Collection Across Hybrid and Multi-Cloud

The solution provides automated forensic acquisition across AWS, Microsoft Azure, GCP, and on-prem environments. It supports both full volume capture, creating a bit-by-bit copy of an entire storage device for the most comprehensive preservation of evidence, and triage collection, which prioritizes speed by gathering only the most essential forensic artifacts such as process data, logs, network connections, and open file contents. This flexibility allows teams to strike the right balance between speed and depth depending on the investigation at hand.

Figure 2: Ability to acquire forensic data from Cloud, SaaS and on-prem environments

Automated Investigations, Root Cause Analysis and Attacker Timelines

Once evidence is collected, Darktrace applies automation to reconstruct attacker activity into a unified timeline. This includes correlating commands, files, lateral movement, and network activity into a single investigative view enriched with custom threat intelligence such as IOCs. Detailed investigation reporting including an investigation summary, an overview of the attacker timeline, and key events. Analysts can pivot into detailed views such as the filesystem view, traversing directories or inspecting file content, or filter and search using faceted options to quickly narrow the scope of an investigation.

Figure 3: Automated Investigation view surfacing the most significant attacker activity, which is contextualized with Alarm information

Forensics for Containers and Ephemeral Assets

Investigating containers and serverless workloads has historically been one of the hardest challenges for DFIR teams, as these assets often disappear before evidence can be preserved. Darktrace / Forensic Acquisition & Investigation captures forensic evidence across managed Kubernetes cloud services, even from distroless or no-shell containers, AWS ECS and other environments, ensuring that ephemeral activity is no longer a blind spot. For hybrid organizations, this extends to on-premises Kubernetes and OpenShift deployments, bringing consistency across environments.

Figure 4: Container investigations – this example is showing the ability to capture containers from managed Kubernetes cloud services

SaaS Log Collection for Modern Investigations

Beyond infrastructure-level data, the solution collects logs from SaaS providers such as Microsoft 365, Entra ID, and Google Workspace. This enables investigations into common attack types like business email compromise (BEC), account takeover (ATO), and insider threats — giving teams visibility into both infrastructure-level and SaaS-driven compromise from a single platform.

Figure 5: Ability to import logs from SaaS providers including Microsoft 365, Entra ID, and Google Workspace

Proactive Vulnerability and Malware Discovery

Finally, the solution surfaces risk proactively with vulnerability and malware discovery for Linux-based cloud resources. Vulnerabilities are presented in a searchable table and correlated with the attacker timeline, enabling teams to quickly understand not just which packages are exposed, but whether they have been targeted or exploited in the context of an incident.

Figure 6: Vulnerability data with pivot points into the attacker timeline

Cloud-Native Scale and Performance

Darktrace / Forensic Acquisition & Investigation uses a cloud-native parallel processing architecture that spins up compute resources on demand, ensuring that investigations run at scale without bottlenecks. Detailed reporting and summaries are automatically generated, giving teams a clear record of the investigation process and supporting compliance, litigation readiness, and executive reporting needs.

Scalable and Flexible Deployment Options

Every organization has different requirements for speed, control, and integration. Darktrace / Forensic Acquisition & Investigation is designed to meet those needs with two flexible deployment models.

  • Self-Hosted Virtual Appliance delivers deep integration and control across hybrid environments, preserving forensic data for compliance and litigation while scaling to the largest enterprise investigations.
  • SaaS-Delivered Deployment provides fast time-to-value out of the box, enabling automated forensic response without requiring deep cloud expertise or heavy setup.

Both models are built to scale across regions and accounts, ensuring organizations of any size can achieve rapid value and adapt the solution to their unique operational and compliance needs. This flexibility makes advanced cloud forensics accessible to every security team — whether they are optimizing for speed, integration depth, or regulatory alignment

Delivering Advanced Cloud Forensics for Every Team

Until now, forensic investigations were slow, manual, and reserved for only the largest organizations with specialized DFIR expertise. Darktrace / Forensic Acquisition & Investigation changes that by leveraging the scale and elasticity of the cloud itself to automate the entire investigation process. From capturing full disk and memory at detection to reconstructing attacker timelines in minutes, the solution turns fragmented workflows into streamlined investigations available to every team.

Whether deployed as a SaaS-delivered service for fast time-to-value or as a self-hosted appliance for deep integration, Darktrace / Forensic Acquisition & Investigation provides the features that matter most: automated evidence capture, cross-cloud investigations, forensic depth for ephemeral assets, and root cause clarity without manual effort.

With Darktrace / Forensic Acquisition & Investigation, what once took days now takes minutes. Now, forensic investigations in the cloud are faster, more scalable, and finally accessible to every security team, no matter their size or expertise.

[related-resource]

Sources: [1], [2] Darktrace Report: Organizations Require a New Approach to Handle Investigations in the Cloud

Additional Resources

Continue reading
About the author
Paul Bottomley
Director of Product Management | Darktrace
Your data. Our AI.
Elevate your network security with Darktrace AI