AIM BLOG

Latest Insights.

Read the latest insights on AI security technologies, industry trends, and prompt engineering from the AIM Intelligence research and engineering teams.

AI Security Digest — May 2026

May 2026 was a watershed month for AI security. From the first AI-authored zero-day exploit confirmed in the wild, to a self-propagating npm worm that reached OpenAI's code-signing pipeline, to prompt injection flaws enabling full RCE in Microsoft's Semantic Kernel — the attack surface around AI systems expanded on every front. This digest covers seven stories that defined the month, including the release of XL-SafetyBench from AIM Intelligence and collaborators.

May 2026 marked a turning point. For the first time, AI was confirmed as the author of a zero-day exploit used in a real attack campaign — not just a tool for automation, but an autonomous vulnerability discoverer and weaponizer. Simultaneously, the infrastructure surrounding AI (SDKs, agent frameworks, npm packages, CI/CD pipelines) proved to be just as dangerous as the models themselves.


Executive Summary


Here are seven stories that defined the month.


1. The First AI-Authored Zero-Day: End-to-End Autonomous Exploit, No Human Required

Date: May 11, 2026 · Severity: Critical · Threat type: Offensive AI

Google's Threat Intelligence Group (GTIG) disclosed a landmark finding: a threat actor used an AI model to discover and write a working exploit for a previously unknown two-factor authentication bypass in a widely-used open-source sysadmin tool.

The exploit — a Python script — carried unmistakable LLM fingerprints: detailed educational docstrings, a hallucinated CVSS score, and unusually structured "textbook Pythonic" formatting. GTIG assessed with high confidence that an AI model autonomously identified the semantic logic flaw (a hard-coded trust assumption in the login flow) and produced the weaponized proof-of-concept without human authorship.

Why it matters: Every prior AI-in-attacks incident involved AI assisting humans — scanning, generating phishing lures, automating repetitive steps. This is the first confirmed case of AI originating the vulnerability discovery and exploit development pipeline end-to-end. The campaign was disrupted before mass exploitation, and the vendor patched the flaw on May 11.

Business impact: Organizations whose incident response plans assume human-paced attack development now have a documented gap. Threat models that don't account for AI-authored exploits are calibrated to a threat landscape that no longer exists.

How industry responded: GTIG disrupted the campaign before mass exploitation and shared indicators of compromise with CERTs across 12 countries. The vendor patched on the same day as disclosure.

AI autonomous exploit generation pipeline — Source Code → LLM/AI Model → Vuln Found → Exploit Script → Target System, with
The AI-Authored Zero-Day Pipeline: fully autonomous from code scan to weaponized exploit


2. Mini Shai-Hulud: 84 Malicious Packages in 6 Minutes — 518M Downloads at Risk

Date: May 11, 2026 · Severity: High · Threat type: Supply Chain

In six minutes between 19:20–19:26 UTC, the extortion group TeamPCP published 84 malicious package artifacts across 42 @tanstack/* npm packages. The attack chained a pull_request_target trigger misconfiguration with Actions cache poisoning and OIDC token extraction — producing validly-attested malicious packages with SLSA Build Level 3 provenance, bypassing supply-chain integrity checks that most organizations rely on.

The worm propagated downstream to OpenAI, Mistral AI, and Guardrails AI. At OpenAI, two employee devices were compromised with limited access to internal source code repositories and credentials. OpenAI rotated its macOS code-signing certificates; all macOS OpenAI apps (ChatGPT Desktop, Codex, Atlas) required forced updates.

Over 170 packages across npm and PyPI were ultimately affected, with a combined 518 million cumulative downloads.

Why it matters: This is the first documented npm worm that produced supply-chain-integrity-compliant malicious packages. SLSA Build Level 3 provenance is widely regarded as the gold standard for build attestation — this attack demonstrates it is not sufficient when the build environment itself is compromised. AI vendors' downstream dependency exposure is now a documented attack vector.

Business impact: Any organization shipping software built on the 170 affected packages carried residual trust risk during the patch window. OpenAI's emergency cert rotation — forcing updates across all macOS apps — demonstrates that when an AI vendor's CI/CD pipeline is a supply chain node, the blast radius reaches end users. Third-party risk programs should now include AI vendors' top dependencies.

How industry responded: OpenAI completed emergency code-signing certificate rotation within 72 hours. GitHub announced hardened pull_request_target defaults for public repositories as a direct response.

npm supply chain worm attack: TeamPCP → npm registry (84 malicious packages in 6 min) → OpenAI, Mistral AI, Guardrails AI, all inside a
Mini Shai-Hulud: Supply Chain Worm Reaching AI Vendors via @tanstack/* Packages


3. Prompt Injection → RCE: CVSS 9.8 + 10.0 in Microsoft Semantic Kernel

Date: May 7, 2026 · Severity: Critical · Threat type: Prompt Injection / Agent Security

Microsoft's own Defender Security Research Team disclosed two vulnerabilities in the Semantic Kernel AI agent framework — both demonstrating how prompt injection becomes arbitrary code execution:

CVE-2026-26030 (CVSS 9.8, Python SDK < 1.39.4): The InMemoryVectorStore filter function passed attacker-controlled input directly into Python's eval(). An attacker could inject os.system() calls via crafted vector store entries. Blacklist-based validators were bypassable via class hierarchy traversal.

CVE-2026-25592 (CVSS 10.0, .NET SDK < 1.71.0): An internal DownloadFileAsync helper was accidentally exposed as a [KernelFunction], making it callable by the LLM with zero path validation. From an Azure Container App, this enabled sandbox escape and file writes to the Windows Startup folder — achieving persistent execution.

Both vulnerabilities follow the same pattern: tool or data inputs that are never sanitized before reaching privileged execution contexts, with the LLM serving as an unwitting relay.

Attacker-controlled input
        ↓
  LLM processes it
        ↓
  Tool call / eval()
        ↓
  Arbitrary code execution on host

Why it matters: Semantic Kernel is one of the most widely deployed enterprise AI agent frameworks. These are not theoretical bugs — they demonstrate a class of vulnerability (prompt injection → tool misuse → RCE) that will recur across every agent framework that passes untrusted input into privileged execution. Patches are available in Python SDK ≥ 1.39.4 and .NET SDK ≥ 1.71.0.

Business impact: CVE-2026-25592 (CVSS 10.0) enables sandbox escape from Azure Container Apps — breach-severity, not a configuration footnote. Every enterprise running .NET-based AI agents on cloud infrastructure should treat this as a P0 patch. The fact that Microsoft's own security team found it first is the best-case scenario; a motivated external attacker would have had the same path.

How industry responded: Microsoft patched both CVEs within 8 days of internal disclosure — faster than the 90-day industry norm — and published detailed compensating controls for organizations that cannot patch immediately.

Prompt injection to RCE vertical flow: Attacker Input → LLM (Semantic Kernel) → Tool Call/eval() → Arbitrary Code Execution, each arrow labeled
Semantic Kernel CVE-2026-26030 & CVE-2026-25592: Prompt Injection to RCE Attack Chain


4. Verizon DBIR 2026: Exploitation Overtakes Credentials — 31% vs 13%, First Time in 19 Years

Date: May 19–20, 2026 · Severity: Industry-wide · Threat type: Industry Intel

The 2026 Verizon Data Breach Investigations Report — covering 22,000+ confirmed breaches across 145 countries — delivered a historic finding: vulnerability exploitation surpassed stolen credentials as the number-one breach entry point for the first time since the report's inception in 2008.

31% of breaches began with a vulnerability exploit, versus 13% via credential theft.

The primary driver: AI. Threat actors now deploy generative AI across a median of 15 documented attack techniques, collapsing exploitation timelines from weeks to hours. Key findings:

Why it matters: The combination of AI-accelerated exploitation and chronically under-resourced patching has fundamentally broken the traditional "patch within 30 days" guidance. The window between disclosure and exploitation is now shorter than most organizations' patch review cycles.

Business impact: These numbers are board-ready: vulnerability exploitation is now the primary breach entry point, third-party involvement has hit 48%, and the 30-day patch SLA is de facto broken for critical CVEs. For security budget owners, this report provides the clearest data-driven justification for expanding both vulnerability management programs and third-party risk assessment scope.

Verizon DBIR 2026: bar chart showing Vulnerability Exploit 31% vs Credential Theft 13% with
Verizon DBIR 2026: Vulnerability Exploitation Overtakes Credentials, Exploitation Windows Collapse to Hours


5. Claude Mythos Finds 23,019 Zero-Days Across 1,000+ Projects — 99% Unpatched

Date: May 26, 2026 · Severity: Critical (systemic) · Threat type: Autonomous AI / AI Red-Teaming

Anthropic's Project Glasswing update confirmed that Claude Mythos Preview — its autonomous vulnerability research model — has identified 23,019 vulnerabilities across 1,000+ open-source projects since April, of which 6,202 are classified high or critical.

Among the confirmed findings: a critical flaw in wolfSSL, the open-source cryptography library used by billions of embedded and IoT devices, enabling certificate forgery to impersonate financial institutions. Six independent security firms validated Mythos findings at a 90%+ true positive rate.

A separate security incident compounded the disclosure: an Anthropic CMS misconfiguration leaked internal documents describing Mythos's capabilities, forcing premature public announcement before coordinated remediation was complete. As of the update, more than 99% of discovered vulnerabilities remain unpatched by open-source maintainers.

Anthropic responded with a $100M defensive initiative partnering with Amazon, Apple, Microsoft, Google, and seven other major technology companies to coordinate patching before hostile actors develop equivalent capabilities.

Why it matters: This is the clearest evidence yet that autonomous AI vulnerability research has crossed a capability threshold that outpaces the open-source ecosystem's ability to respond. The unpatched 99% represents a significant window during which adversarial actors — state-sponsored or otherwise — could develop parallel offensive tooling against the same targets.

Business impact: The confirmed wolfSSL vulnerability affects billions of embedded and IoT devices with no vendor patch currently available. Organizations with OT/IoT exposure should inventory wolfSSL usage now. More broadly, if adversarial actors develop equivalent AI discovery capability, the 23,000 unpatched findings become an offensive catalog — this is a systemic risk to open-source-dependent supply chains.

How industry responded: Amazon, Apple, Microsoft, and Google joined Anthropic's $100M defensive patching initiative. The Linux Foundation created an emergency OSS maintainer response fund specifically for AI-discovered vulnerabilities.

Claude Mythos AI at center with arrows radiating to dozens of open-source project boxes (wolfSSL highlighted with
Claude Mythos / Project Glasswing: 23,019 Vulnerabilities Found, 99% Unpatched


6. MCP Supply Chain: 10+ CVEs, 200,000 Vulnerable Instances — The Log4Shell Pattern Repeats

Date: April–May 2026 (ongoing) · Severity: Critical · Threat type: Supply Chain / Prompt Injection

OX Security's extended research into Anthropic's Model Context Protocol (MCP) ecosystem — published in April but with broad industry impact through May — documented a systemic architectural vulnerability. The STDIO-based MCP execution model allows user input to flow into command execution without sanitization, enabling prompt injection to become RCE across any MCP-connected tool.

Ten CVEs were issued across production platforms. Representative examples:

CVE Product Impact
CVE-2026-30623 LiteLLM Authenticated RCE via JSON
CVE-2026-30624 Agent Zero Unauthenticated UI injection
CVE-2026-30615 Windsurf Zero-click prompt injection
CVE-2025-65720 GPT Researcher Reverse shell via UI
CVE-2026-30625 Upsonic Allowlist bypass

OX researchers identified approximately 200,000 vulnerable instances across 7,000+ publicly exposed MCP servers and 150M+ affected library downloads. Claude Code, Cursor, VS Code, Windsurf, and Gemini CLI were all confirmed vulnerable to MCP-routed prompt injection.

Anthropic declined to modify the protocol architecture, citing the STDIO behavior as "by design." Individual vendors issued partial patches.

Why it matters: MCP prompt injection represents the same threat model as SQL injection in the 2000s — an architectural pattern where a new technology (AI tool-use) creates a new class of injection vulnerability, and the ecosystem is years behind in its defensive posture. With MCP adoption accelerating, this class of vulnerability will be exploited at scale before mitigations become standard.

Business impact: 200,000 exposed MCP instances with no centralized patch mechanism mirrors the early Log4Shell window — the vulnerability is structural, not per-vendor, making coordinated remediation harder than a single CVE. Organizations adopting MCP-based AI tooling (Cursor, VS Code plugins, custom MCP servers) should gate approvals through the same process as third-party software and treat all MCP inputs as untrusted.

How industry responded: Cursor and Windsurf issued partial patches. VS Code added a user-confirmation prompt for MCP tool calls. Anthropic maintained the STDIO architecture is by design; no protocol-level fix was issued.

MCP architecture vulnerability: User/Attacker Input → MCP Server (STDIO) with
MCP Supply Chain: STDIO Prompt Injection Bypasses Security Checks Across 200,000 Instances


7. XL-SafetyBench: AI Safety Evaluation Has a Geography Problem

Date: May 2026 · From: AIM Intelligence and collaborators · Threat type: AI Evaluation

Most AI safety benchmarks are built in English, for English-speaking risk contexts, validated by English-speaking annotators. AIM Intelligence — together with Microsoft, Korea AI Safety Institute, KT, BMW Group, Technical University of Munich, Ankara Üniversitesi, and Seoul National University — released XL-SafetyBench to address this gap.

The benchmark covers 10 country-language pairs (US, France, Germany, Spain, South Korea, Japan, India, Indonesia, Türkiye, UAE) with 5,500 test cases across two orthogonal dimensions:

Cultural benchmark scenario: user in Türkiye asks to tidy a to-do list with
XL-SafetyBench Cultural Benchmark: violation embedded inside an ordinary task

Two findings stood out across 10 frontier models:

Jailbreak robustness and cultural awareness are decoupled. A model that resists adversarial attacks does not necessarily understand local cultural context — and vice versa. Gemini-3.1-Pro leads cultural sensitivity (CSR 76.1%) but has a 43.4% attack success rate. A single safety score hides this divergence entirely.

The Illusion of Safety in local models. When plotting attack success rate against incoherence rate across 27 local models, they cluster along a near-linear trade-off: ASR + NSR ≈ 100%. What looks like safety is often linguistic failure — the model achieves a low attack success rate not by refusing, but by failing to understand the prompt.

Business impact: Organizations deploying LLMs in global markets — financial services, healthcare, e-commerce — cannot rely on English-language safety benchmarks for compliance or vendor selection. A model that passes standard benchmarks may have a 43.4% attack success rate on country-specific prompts. Country-grounded safety evaluation should be a procurement requirement, not a post-deployment discovery.


The Pattern Across May 2026

Looking across all seven stories, a consistent pattern emerges:

AI infrastructure is the new attack surface. Four of the seven incidents involve not the AI models themselves, but the frameworks, SDKs, package ecosystems, and protocols surrounding them — Semantic Kernel, npm/TanStack, MCP servers, and CI/CD pipelines. Securing AI deployments requires treating the entire dependency chain as part of the threat model.

The exploitation lifecycle is collapsing. The Verizon DBIR data and the AI-authored zero-day both point to the same trend: AI is compressing the time between vulnerability existence and weaponized exploitation. Defensive teams need detection and response capabilities that operate at AI speed, not human speed.

Prompt injection is the SQL injection of the AI era. CVE-2026-26030, CVE-2026-25592, and the MCP CVE cluster all exploit the same root cause: user-controlled input reaching privileged execution contexts without sanitization. The pattern is old; the execution environment is new.


This Month's Security Leader Checklist

Actions drawn directly from the incidents above:


AIM Intelligence monitors the global AI security threat landscape and publishes this digest monthly. For deeper analysis of any incident above, see our research posts or contact us.

← Back to List
aim

Ready to secure your AI?

Consult with AIM Intelligence's security experts and request a free red teaming demo optimized for your system.

EXPLORE PLATFORM