Blog
/
Network
/
November 20, 2025

Xillen Stealer Updates to Version 5 to Evade AI Detection

Xillen Stealer v4/v5 introduces advanced features to evade AI detection, steal credentials, cryptocurrency, and sensitive data across browsers, password managers, and cloud environments. With polymorphic engines, container persistence, and behavioral mimicking, this Python-based malware highlights evolving threats and future AI integration in cybercrime campaigns.
Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
Tara Gould
Malware Research Lead
xillen stealer updates to version 5 to evade ai detectionDefault blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog image
20
Nov 2025

Introduction

Python-based information stealer “Xillen Stealer” has recently released versions 4 and 5, expanding its targeting and functionality. The cross-platform infostealer, originally reported by Cyfirma in September 2025, targets sensitive data including credentials, cryptocurrency wallets, system information, browser data and employs anti-analysis techniques.  

The update to v4/v5 includes significantly more functionality, including:

  • Persistence
  • Ability to steal credentials from password managers, social media accounts, browser data (history, cookies and passwords) from over 100 browsers, cryptocurrency from over 70 wallets
  • Kubernetes configs and secrets
  • Docker scanning
  • Encryption
  • Polymorphism
  • System hooks
  • Peer-to-Peer (P2P) Command-and-Control (C2)
  • Single Sign-On (SSO) collector
  • Time-Based One-Time Passwords (TOTP) and biometric collection
  • EDR bypass
  • AI evasion
  • Interceptor for Two-Factor Authentication (2FA)
  • IoT scanning
  • Data exfiltration via Cloud APIs

Xillen Stealer is marketed on Telegram, with different licenses available for purchase. Users who deploy the malware have access to a professional-looking GUI that enables them to view exfiltrated data, logs, infections, configurations and subscription information.

Screenshot of the Xillen Stealer portal.
Figure 1: Screenshot of the Xillen Stealer portal.

Technical analysis

The following technical analysis examines some of the interesting functions of Xillen Stealer v4 and v5. The main functionality of Xillen Stealer is to steal cryptocurrency, credentials, system information, and account information from a range of stores.

Xillen Stealer specifically targets the following wallets and browsers:

AITargetDectection

Screenshot of Xillen Stealer’s AI Target detection function.
Figure 2: Screenshot of Xillen Stealer’s AI Target detection function.

The ‘AITargetDetection’ class is intended to use AI to detect high-value targets based on weighted indicators and relevant keywords defined in a dictionary. These indicators include “high value targets”, like cryptocurrency wallets, banking data, premium accounts, developer accounts, and business emails. Location indicators include high-value countries such as the United States, United Kingdom, Germany and Japan, along with cryptocurrency-friendly countries and financial hubs. Wealth indicators such as keywords like CEO, trader, investor and VIP have also been defined in a dictionary but are not in use at this time, pointing towards the group’s intent to develop further in the future.

While the class is named ‘AITargetDetection’ and includes placeholder functions for initializing and training a machine learning model, there is no actual implementation of machine learning. Instead, the system relies entirely on rule-based pattern matching for detection and scoring. Even though AI is not actually implemented in this code, it shows how malware developers could use AI in future malicious campaigns.

Screenshot of dead code function.
Figure 3: Screenshot of dead code function.

AI Evasion

Screenshot of AI evasion function to create entropy variance.
Figure 4: Screenshot of AI evasion function to create entropy variance.

‘AIEvasionEngine’ is a module designed to help malware evade AI-based or behavior-based detection systems, such as EDRs and sandboxes. It mimics legitimate user and system behavior, injects statistical noise, randomizes execution patterns, and camouflages resource usage. Its goal is to make the malware appear benign to machine learning detectors. The techniques used to achieve this are:

  • Behavioral Mimicking: Simulates user actions (mouse movement, fake browser use, file/network activity)
  • Noise Injection: Performs random memory, CPU, file, and network operations to confuse behavioral classifiers
  • Timing Randomization: Introduces irregular delays and sleep patterns to avoid timing-based anomaly detection
  • Resource Camouflage: Adjusts CPU and memory usage to imitate normal apps (such as browsers, text editors)
  • API Call Obfuscation: Random system API calls and pattern changes to hide malicious intent
  • Memory Access Obfuscation: Alters access patterns and entropy to bypass ML models monitoring memory behavior

PolymorphicEngine

As part of the “Rust Engine” available in Xillen Stealer is the Polymorphic Engine. The ‘PolymorphicEngine’ struct implements a basic polymorphic transformation system designed for obfuscation and detection evasion. It uses predefined instruction substitutions, control-flow pattern replacements, and dead code injection to produce varied output. The mutate_code() method scans input bytes and replaces recognized instruction patterns with randomized alternatives, then applies control flow obfuscation and inserts non-functional code to increase variability. Additional features include string encryption via XOR and a stub-based packer.

Collectors

DevToolsCollector

Figure 5: Screenshot of Kubernetes data function.

The ‘DevToolsCollector’ is designed to collect sensitive data related to a wide range of developer tools and environments. This includes:

IDE configurations

  • VS Code, VS Code Insiders, Visual Studio
  • JetBrains: Intellij, PyCharm, WebStorm
  • Sublime
  • Atom
  • Notepad++
  • Eclipse

Cloud credentials and configurations

  • AWS
  • GCP
  • Azure
  • Digital Ocean
  • Heroku

SSH keys

Docker & Kubernetes configurations

Git credentials

Database connection information

  • HeidiSQL
  • Navicat
  • DBeaver
  • MySQL Workbench
  • pgAdmin

API keys from .env files

FTP configs

  • FileZilla
  • WinSCP
  • Core FTP

VPN configurations

  • OpenVPN
  • WireGuard
  • NordVPN
  • ExpressVPN
  • CyberGhost

Container persistence

Screenshot of Kubernetes inject function.
Figure 6: Screenshot of Kubernetes inject function.

Biometric Collector

Screenshot of the ‘BiometricCollector’ function.
Figure 7: Screenshot of the ‘BiometricCollector’ function.

The ‘BiometricCollector’ attempts to collect biometric information from Windows systems by scanning the C:\Windows\System32\WinBioDatabase directory, which stores Windows Hello and other biometric configuration data. If accessible, it reads the contents of each file, encodes them in Base64, preparing them for later exfiltration. While the data here is typically encrypted by Windows, its collection indicates an attempt to extract sensitive biometric data.

Password Managers

The ‘PasswordManagerCollector’ function attempts to steal credentials stored in password managers including, OnePass, LastPass, BitWarden, Dashlane, NordPass and KeePass. However, this function is limited to Windows systems only.

SSOCollector

The ‘SSOCollector’ class is designed to collect authentication tokens related to SSO systems. It targets three main sources: Azure Active Directory tokens stored under TokenBroker\Cache, Kerberos tickets obtained through the klist command, and Google Cloud authentication data in user configuration folders. For each source, it checks known directories or commands, reads partial file contents, and stores the results as in a dictionary. Once again, this function is limited to Windows systems.

TOTP Collector

The ‘TOTP Collector’ class attempts to collect TOTPs from:

  • Authy Desktop by locating and reading from Authy.db SQLite databases
  • Microsoft Authenticator by scanning known application data paths for stored binary files
  • TOTP-related Chrome extensions by searching LevelDB files for identifiable keywords like “gauth” or “authenticator”.

Each method attempts to locate relevant files, parse or partially read their contents, and store them in a dictionary under labels like authy, microsoft_auth, or chrome_extension. However, as before, this is limited to Windows, and there is no handling for encrypted tokens.

Enterprise Collector

The ‘EnterpriseCollector’ class is used to extract credentials related to an enterprise Windows system. It targets configuration and credential data from:

  • VPN clients
    • Cisco AnyConnect, OpenVPN, Forticlient, Pulse Secure
  • RDP credentials
  • Corporate certificates
  • Active Directory tokens
  • Kerberos tickets cache

The files and directories are located based on standard environment variables with their contents read in binary mode and then encoded in Base64.

Super Extended Application Collector

The ‘SuperExtendedApplication’ Collector class is designed to scan an environment for 160 different applications on a Windows system. It iterates through the paths of a wide range of software categories including messaging apps, cryptocurrency wallets, password managers, development tools, enterprise tools, gaming clients, and security products. The list includes but is not limited to Teams, Slack, Mattermost, Zoom, Google Meet, MS Office, Defender, Norton, McAfee, Steam, Twitch, VMWare, to name a few.

Bypass

AppBoundBypass

This code outlines a framework for bypassing App Bound protections, Google Chrome' s cookie encryption. The ‘AppBoundBypass’ class attempts several evasion techniques, including memory injection, dynamic-link library (DLL) hijacking, process hollowing, atom bombing, and process doppelgänging to impersonate or hijack browser processes. As of the time of writing, the code contains multiple placeholders, indicating that the code is still in development.

Steganography

The ‘SteganographyModule’ uses steganography (hiding data within an image) to hide the stolen data, staging it for exfiltration. Multiple methods are implemented, including:

  • Image steganography: LSB-based hiding
  • NTFS Alternate Data Streams
  • Windows Registry Keys
  • Slack space: Writing into unallocated disk cluster space
  • Polyglot files: Appending archive data to images
  • Image metadata: Embedding data in EXIF tags
  • Whitespace encoding: Hiding binary in trailing spaces of text files

Exfiltration

CloudProxy

Screenshot of the ‘CloudProxy’ class.
Figure 8: Screenshot of the ‘CloudProxy’ class.

The CloudProxy class is designed for exfiltrating data by routing it through cloud service domains. It encodes the input data using Base64, attaches a timestamp and SHA-256 signature, and attempts to send this payload as a JSON object via HTTP POST requests to cloud URLs including AWS, GCP, and Azure, allowing the traffic to blend in. As of the time of writing, these public facing URLs do not accept POST requests, indicating that they are placeholders meant to be replaced with attacker-controlled cloud endpoints in a finalized build.

P2PEngine

Screenshot of the P2PEngine.
Figure 9: Screenshot of the P2PEngine.

The ‘P2PEngine’ provides multiple methods of C2, including embedding instructions within blockchain transactions (such as Bitcoin OP_RETURN, Ethereum smart contracts), exfiltrating data via anonymizing networks like Tor and I2P, and storing payloads on IPFS (a distributed file system). It also supports domain generation algorithms (DGA) to create dynamic .onion addresses for evading detection.

After a compromise, the stealer creates both HTML and TXT reports containing the stolen data. It then sends these reports to the attacker’s designated Telegram account.

Xillen Killers

 Xillen Killers.
FIgure 10: Xillen Killers.

Xillen Stealer appears to be developed by a self-described 15-year-old “pentest specialist” “Beng/jaminButton” who creates TikTok videos showing basic exploits and open-source intelligence (OSINT) techniques. The group distributing the information stealer, known as “Xillen Killers”, claims to have 3,000 members. Additionally, the group claims to have been involved in:

  • Analysis of Project DDoSia, a tool reportedly used by the NoName057(16) group, revealing that rather functioning as a distributed denial-of-service (DDos) tool, it is actually a remote access trojan (RAT) and stealer, along with the identification of involved individuals.
  • Compromise of doxbin.net in October 2025.
  • Discovery of vulnerabilities on a Russian mods site and a Ukrainian news site

The group, which claims to be part of the Russian IT scene, use Telegram for logging, marketing, and support.

Conclusion

While some components of XillenStealer remain underdeveloped, the range of intended feature set, which includes credential harvesting, cryptocurrency theft, container targeting, and anti-analysis techniques, suggests that once fully developed it could become a sophisticated stealer. The intention to use AI to help improve targeting in malware campaigns, even though not yet implemented, indicates how threat actors are likely to incorporate AI into future campaigns.  

Credit to Tara Gould (Threat Research Lead)
Edited by Ryan Traill (Analyst Content Lead)

Appendicies

Indicators of Compromise (IoCs)

395350d9cfbf32cef74357fd9cb66134 - confid.py

F3ce485b669e7c18b66d09418e979468 - stealer_v5_ultimate.py

3133fe7dc7b690264ee4f0fb6d867946 - xillen_v5.exe

https://github[.]com/BengaminButton/XillenStealer

https://github[.]com/BengaminButton/XillenStealer/commit/9d9f105df4a6b20613e3a7c55379dcbf4d1ef465

MITRE ATT&CK

ID Technique

T1059.006 - Python

T1555 - Credentials from Password Stores

T1555.003 - Credentials from Password Stores: Credentials from Web Browsers

T1555.005 - Credentials from Password Stores: Password Managers

T1649 - Steal or Forge Authentication Certificates

T1558 - Steal or Forge Kerberos Tickets

T1539 - Steal Web Session Cookie

T1552.001 - Unsecured Credentials: Credentials In Files

T1552.004 - Unsecured Credentials: Private Keys

T1552.005 - Unsecured Credentials: Cloud Instance Metadata API

T1217 - Browser Information Discovery

T1622 - Debugger Evasion

T1082 - System Information Discovery

T1497.001 - Virtualization/Sandbox Evasion: System Checks

T1115 - Clipboard Data

T1001.002 - Data Obfuscation: Steganography

T1567 - Exfiltration Over Web Service

T1657 - Financial Theft

Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
Tara Gould
Malware Research Lead

More in this series

No items found.

Blog

/

Cloud

/

April 9, 2026

Bringing Together SOC and IR teams with Automated Threat Investigations for the Hybrid World

Default blog imageDefault blog image

The investigation gap: Why incident response is slow, fragmented and reactive

Modern investigations often fall apart the moment analysts move beyond an initial alert. Whether detections originate in cloud or on-prem environments, SOC and Incident Response (IR) teams are frequently hindered by fragmented tools and data sources, closed ecosystems, and slow, manual evidence collection just to access the forensic context they need. SOC analysts receive alerts without the depth required to confidently confirm or dismiss a threat, while IR teams struggle with inconsistent visibility across cloud, on‑premises, and contained endpoints, creating delays, blind spots, and incomplete attack timelines.

This gap between SOC and Digital Forensics and Incident Response (DFIR) slows response and forces teams into reactive and inefficient investigation patterns. Security teams struggle to collect high‑fidelity forensic data during active incidents, particularly from cloud workloads, on‑prem systems, and XDR‑contained endpoints where traditional tools cannot operate without deploying new agents or disrupting containment. The result is a fragmented response process where investigations slow down, context gets lost, and critical attacker activity can slip through the cracks.

What’s new at Darktrace

Helping teams move from detection to root cause faster, more efficiently, and with greater confidence

The latest update to Darktrace / Forensic Acquisition & Investigation eliminates the traditional handoff between the SOC and IR teams, enabling analysts to seamlessly pivot from alert into forensic investigation. It also brings on-demand and automated data capture through Darktrace / ENDPOINT as well as third-party detection platforms, where investigators can safely collect critical forensic data from network contained endpoints, preserving containment while accelerating investigation and response.  

Together, this solidifies / Forensic Acquisition & Investigation as an investigation-first platform beyond the cloud, fit for any organization that has adopted a multi-technology infrastructure. In practice, when these various detection sources and host‑level forensics are combined, investigations move from limited insight to complete understanding quickly, giving security teams the clarity and deep context required to drive confident remediation and response based on the exact tactics, techniques and procedures employed.

Integrated forensic context inside every incident workflow

SOC analysts now have seamless access to forensic evidence at the exact moment they need it. There is a new dedicated Forensics tab inside Cyber AI Analyst™ incidents, allowing users to move instantly from detection to rich forensic context in a single click, without the need to export data or get other teams involved.

For investigations that previously required multiple tools, credentials, or intervention by a dedicated team, this change represents a shift toward truly embedded incident‑driven forensics – accelerating both decision‑making and response quality at the point of detection.

Figure 1: The forensic investigation associated with the Cyber AI Analyst™ incident appears in a dedicated ‘Forensics’ tab, with the ability to pivot into the / Forensic Acquisition & Investigation UI for full context and deep analysis workflows.

Reliable automated and manual hybrid evidence capture across any environment

Across cloud, on‑premises, and hybrid environments, analysts can now automate or request on‑demand forensic evidence collection the moment a threat is detected via Darktrace / ENDPOINT. This allows investigators to quickly capture high-fidelity forensic data from endpoints already under protection, accelerating investigations without additional tooling or disrupting systems. Especially in larger environments where the ability to scale is critical, automated data capture across hybrid environments significantly reduces response time and enables consistent, repeatable investigations.

Unlike EDR‑only solutions, which capture only a narrow slice of activity, these workflows provide high‑quality, cross‑environment forensic depth, even on third‑party XDR‑contained devices that many vendor ecosystems cannot reach.

The result is a single, unified process for capturing the forensic context analysts need no matter where the threat originates, even in third-party vendor protected areas.

Figure 2: The ability to acquire, process, and investigate devices with the Darktrace / ENDPOINT agent installed using the ‘Darktrace Endpoint’ import provider
Figure 3: A Linux device that has the Darktrace / ENDPOINT agent installed has been acquired and processed by / Forensic Acquisition & Investigation

Investigation‑first design flexible for hybrid organizations

Luckily, taking advantage of automated forensic data capture of non-cloud assets won’t be subject to those who purely use Darktrace / ENDPOINT. This functionality is also available where CrowdStrike, Microsoft Defender for Endpoint, or SentinelOne agents are deployed.  In the case of CrowdStrike, Darktrace / Forensic Acquisition & Investigation can also perform a triage capture of a device that has been contained using CrowdStrike’s network containment capability. What’s critical here is the fact that investigators can safely acquire additional forensic evidence without breaking or altering containment. That massively improves investigation and response time without adding more risk factors.

Figure 4: ‘cado.xdr.test2’ has been contained using CrowdStrike’s network containment capability
Figure 5: Successful triage capture of contained endpoint ‘cado.xdr.test2’ using / Forensic Acquisition & Investigation

The benefits of extending forensics to on‑premises and endpoint environments

Despite Darktrace / Forensic Acquisition & Investigation originating as a cloud‑first solution, the challenges of incident response are not limited to the cloud. Many investigations span on‑premises servers, unmanaged endpoints, legacy systems, or devices locked inside third‑party ecosystems.  

By extending automated investigation capabilities into on‑premises environments and endpoints, Darktrace delivers several critical benefits:

  • Unified investigations across hybrid infrastructure and a heterogeneous security stack
  • Consistent forensic depth regardless of asset type
  • Faster and more accurate root-cause analysis
  • Stronger incident response readiness

Figure 6: Unified alerts from cloud and on-prem environments, grouped into incident-centric investigations with forensic depth

Simplifying deep investigations across hybrid environments

These enhancements move Darktrace / Forensic Acquisition & Investigation closer to a vision out of reach for most security teams: seamless, integrated, high‑fidelity forensics across cloud, on‑prem, and endpoint environments where other solutions usually stop at detection. Automated forensics as a whole is fueling faster outcomes with complete clarity throughout the end-to-end investigation process, which now takes teams from alert to understanding in minutes compared to days or even weeks. All without added agents, disruptions, or specialized teams. The result is an incident response lifecycle that finally matches the reality of modern infrastructure.

Ready to see Darktrace / Forensic Acquisition & Investigation in your environment? Request a demo.

Hear from industry-leading experts on the latest developments in AI cybersecurity at Darktrace LIVE. Coming to a city near you.

[related-resource]

Continue reading
About the author
Paul Bottomley
Director of Product Management | Darktrace

Blog

/

AI

/

April 10, 2026

How to Secure AI and Find the Gaps in Your Security Operations

secuing AI testing gaps security operationsDefault blog imageDefault blog image

What “securing AI” actually means (and doesn’t)

Security teams are under growing pressure to “secure AI” at the same pace which businesses are adopting it. But in many organizations, adoption is outpacing the ability to govern, monitor, and control it. When that gap widens, decision-making shifts from deliberate design to immediate coverage. The priority becomes getting something in place, whether that’s a point solution, a governance layer, or an extension of an existing platform, rather than ensuring those choices work together.

At the same time, AI governance is lagging adoption. 37% of organizations still lack AI adoption policies, shadow AI usage across SaaS has surged, and there are notable spikes in anomalous data uploads to generative AI services.  

First and foremost, it’s important to recognize the dual nature of AI risk. Much of the industry has focused on how attackers will use AI to move faster, scale campaigns, and evade detection. But what’s becoming just as significant is the risk introduced by AI inside the organization itself. Enterprises are rapidly embedding AI into workflows, SaaS platforms, and decision-making processes, creating new pathways for data exposure, privilege misuse, and unintended access across an already interconnected environment.

Because the introduction of complex AI systems into modern, hybrid environments is reshaping attacker behavior and exposing gaps between security functions, the challenge is no longer just having the right capabilities in place but effectively coordinating prevention, detection, investigation, response, and remediation together. As threats accelerate and systems become more interconnected, security depends on coordinated execution, not isolated tools, which is why lifecycle-based approaches to governance, visibility, behavioral oversight, and real-time control are gaining traction.

From cloud consolidation to AI systems what we can learn

We have seen a version of AI adoption before in cloud security. In the early days, tooling fragmented into posture, workload/runtime, identity, data, and more. Gradually, cloud security collapsed into broader cloud platforms. The lesson was clear: posture without runtime misses active threats; runtime without posture ignores root causes. Strong programs ran both in parallel and stitched the findings together in operations.  

Today’s AI wave stretches that lesson across every domain. Adversaries are compressing “time‑to‑tooling” using LLM‑assisted development (“vibecoding”) and recycling public PoCs at unprecedented speed. That makes it difficult to secure through siloed controls, because the risk is not confined to one layer. It emerges through interactions across layers.

Keep in mind, most modern attacks don’t succeed by defeating a single control. They succeed by moving through the gaps between systems faster than teams can connect what they are seeing. Recent exploitation waves like React2Shell show how quickly opportunistic actors operationalize fresh disclosures and chain misconfigurations to monetize at scale.

In the React2Shell window, defenders observed rapid, opportunistic exploitation and iterative payload diversity across a broad infrastructure footprint, strains that outpace signature‑first thinking.  

You can stay up to date on attacker behavior by signing up for our newsletter where Darktrace’s threat research team and analyst community regularly dive deep into threat finds.

Ultimately, speed met scale in the cloud era; AI adds interconnectedness and orchestration. Simple questions — What happened? Who did it? Why? How? Where else? — now cut across identities, SaaS agents, model/service endpoints, data egress, and automated actions. The longer it takes to answer, the worse the blast radius becomes.

The case for a platform approach in the age of AI

Think of security fusion as the connective tissue that lets you prevent, detect, investigate, and remediate in parallel, not in sequence. In practice, that looks like:

  1. Unified telemetry with behavioral context across identities, SaaS, cloud, network, endpoints, and email—so an anomalous action in one plane automatically informs expectations in others. (Inside‑the‑SOC investigations show this pays off when attacks hop fast between domains.)  
  1. Pre‑CVE and “in‑the‑wild” awareness feeding controls before signatures—reducing dwell time in fast exploitation windows.  
  1. Automated, bounded response that can contain likely‑malicious actions at machine speed without breaking workflows—buying analysts time to investigate with full context. (Rapid CVE coverage and exploit‑wave posts illustrate how critical those first minutes are.)  
  1. Investigation workflows that assume AI is in the loop—for both defenders and attackers. As adversaries adopt “agentic” patterns, investigations need graph‑aware, sequence‑aware reasoning to prioritize what matters early.

This isn’t theoretical. It’s reflected in the Darktrace posts that consistently draw readership: timely threat intel with proprietary visibility and executive frameworks that transform field findings into operating guidance.  

The five questions that matter (and the one that matters more)

When alerted to malicious or risky AI use, you’ll ask:

  1. What happened?
  1. Who did it?
  1. Why did they do it?
  1. How did they do it?
  1. Where else can this happen?

The sixth, more important question is: How much worse does it get while you answer the first five? The answer depends on whether your controls operate in sequence (slow) or in fused parallel (fast).

What to watch next: How the AI security market will likely evolve

Security markets tend to follow a familiar pattern. New technologies drive an initial wave of specialized tools (posture, governance, observability) each focused on a specific part of the problem. Over time, those capabilities consolidate as organizations realize the new challenge is coordination.

AI is accelerating the shift of focus to coordination because AI-powered attackers can move faster and operate across more systems at once. Recent exploitation waves show exactly this. Adversaries can operationalize new techniques and move across domains, turning small gaps into full attack paths.

Anticipate a continued move toward more integrated security models because fragmented approaches can’t keep up with the speed and interconnected nature of modern attacks.

Building the Groundwork for Secure AI: How to Test Your Stack’s True Maturity

AI doesn’t create new surfaces as much as it exposes the fragility of the seams that already exist.  

Darktrace’s own public investigations consistently show that modern attacks, from LinkedIn‑originated phishing that pivots into corporate SaaS to multi‑stage exploitation waves like BeyondTrust CVE‑2026‑1731 and React2Shell, succeed not because a single control failed, but because no control saw the whole sequence, or no system was able to respond at the speed of escalation.  

Before thinking about “AI security,” customers should ensure they’ve built a security foundation where visibility, signals, and responses can pass cleanly between domains. That requires pressure‑testing the seams.

Below are the key integration questions and stack‑maturity tests every organization should run.

1. Do your controls see the same event the same way?

Integration questions

  • When an identity behaves strangely (impossible travel, atypical OAuth grants), does that signal automatically inform your email, SaaS, cloud, and endpoint tools?
  • Do your tools normalize events in a way that lets you correlate identity → app → data → network without human stitching?

Why it matters

Darktrace’s public SOC investigations repeatedly show attackers starting in an unmonitored domain, then pivoting into monitored ones, such as phishing on LinkedIn that bypassed email controls but later appeared as anomalous SaaS behavior.

If tools can’t share or interpret each other's context, AI‑era attacks will outrun every control.

Tests you can run

  1. Shadow Identity Test
  • Create a temporary identity with no history.
  • Perform a small but unusual action: unusual browser, untrusted IP, odd OAuth request.
  • Expected maturity signal: other tools (email/SaaS/network) should immediately score the identity as high‑risk.
  1. Context Propagation Test
  • Trigger an alert in one system (e.g., endpoint anomaly) and check if other systems automatically adjust thresholds or sensitivity.
  • Low maturity signal: nothing changes unless an analyst manually intervenes.

2. Does detection trigger coordinated action, or does everything act alone?

Integration questions

  • When one system blocks or contains something, do other systems automatically tighten, isolate, or rate‑limit?
  • Does your stack support bounded autonomy — automated micro‑containment without broad business disruption?

Why it matters

In public cases like BeyondTrust CVE‑2026‑1731 exploitation, Darktrace observed rapid C2 beaconing, unusual downloads, and tunneling attempts across multiple systems. Containment windows were measured in minutes, not hours.  

Tests you can run

  1. Chain Reaction Test
  • Simulate a primitive threat (e.g., access from TOR exit node).
  • Your identity provider should challenge → email should tighten → SaaS tokens should re‑authenticate.
  • Weak seam indicator: only one tool reacts.
  1. Autonomous Boundary Test
  • Induce a low‑grade anomaly (credential spray simulation).
  • Evaluate whether automated containment rules activate without breaking legitimate workflows.

3. Can your team investigate a cross‑domain incident without swivel‑chairing?

Integration questions

  • Can analysts pivot from identity → SaaS → cloud → endpoint in one narrative, not five consoles?
  • Does your investigation tooling use graphs or sequence-based reasoning, or is it list‑based?

Why it matters

Darktrace’s Cyber AI Analyst and DIGEST research highlights why investigations must interpret structure and progression, not just standalone alerts. Attackers now move between systems faster than human triage cycles.  

Tests you can run

  1. One‑Hour Timeline Build Test
  • Pick any detection.
  • Give an analyst one hour to produce a full sequence: entry → privilege → movement → egress.
  • Weak seam indicator: they spend >50% of the hour stitching exports.
  1. Multi‑Hop Replay Test
  • Simulate an incident that crosses domains (phish → SaaS token → data access).
  • Evaluate whether the investigative platform auto‑reconstructs the chain.

4. Do you detect intent or only outcomes?

Integration questions

  • Can your stack detect the setup behaviors before an attack becomes irreversible?
  • Are you catching pre‑CVE anomalies or post‑compromise symptoms?

Why it matters

Darktrace publicly documents multiple examples of pre‑CVE detection, where anomalous behavior was flagged days before vulnerability disclosure. AI‑assisted attackers will hide behind benign‑looking flows until the very last moment.

Tests you can run

  1. Intent‑Before‑Impact Test
  • Simulate reconnaissance-like behavior (DNS anomalies, odd browsing to unknown SaaS, atypical file listing).
  • Mature systems will flag intent even without an exploit.
  1. CVE‑Window Test
  • During a real CVE patch cycle, measure detection lag vs. public PoC release.
  • Weak seam indicator: your detection rises only after mass exploitation begins.

5. Are response and remediation two separate universes?

Integration questions

  • When you contain something, does that trigger root-cause remediation workflows in identity, cloud config, or SaaS posture?
  • Does fixing a misconfiguration automatically update correlated controls?

Why it matters

Darktrace’s cloud investigations (e.g., cloud compromise analysis) emphasize that remediation must close both runtime and posture gaps in parallel.

Tests you can run

  1. Closed‑Loop Remediation Test
  • Introduce a small misconfiguration (over‑permissioned identity).
  • Trigger an anomaly.
  • Mature stacks will: detect → contain → recommend or automate posture repair.
  1. Drift‑Regression Test
  • After remediation, intentionally re‑introduce drift.
  • The system should immediately recognize deviation from known‑good baseline.

6. Do SaaS, cloud, email, and identity all agree on “normal”?

Integration questions

  • Is “normal behavior” defined in one place or many?
  • Do baselines update globally or per-tool?

Why it matters

Attackers (including AI‑assisted ones) increasingly exploit misaligned baselines, behaving “normal” to one system and anomalous to another.

Tests you can run

  1. Baseline Drift Test
  • Change the behavior of a service account for 24 hours.
  • Mature platforms will flag the deviation early and propagate updated expectations.
  1. Cross‑Domain Baseline Consistency Test
  • Compare identity’s risk score vs. cloud vs. SaaS.
  • Weak seam indicator: risk scores don’t align.

Final takeaway

Security teams should ask be focused on how their stack operates as one system before AI amplifies pressure on every seam.

Only once an organization can reliably detect, correlate, and respond across domains can it safely begin to secure AI models, agents, and workflows.

Continue reading
About the author
Nabil Zoldjalali
VP, Field CISO
Your data. Our AI.
Elevate your network security with Darktrace AI