AI Security

When Prompt Injection Becomes RCE: Inside CVE-2026-26030

A single prompt. One eval() call. Host RCE. Inside CVE-2026-26030 — the Semantic Kernel bypass that turned an AI agent into a remote code execution primitive.

Most prompt injection demos end with a chatbot saying something embarrassing. Microsoft's May 7 writeup ends with calc.exe popping on the host. The agent did exactly what it was designed to do — parse natural language into a tool call. The bad beat happened one layer below, where the framework treated AI-controlled parameters as trusted data and fed them into eval(). Anyone shipping AI agents in 2026 is sitting at the same table.

TL;DR

Microsoft published detailed research on May 7, 2026, on two RCE-class vulnerabilities in Semantic Kernel: CVE-2026-26030 (In-Memory Vector Store) and CVE-2026-25592 (SessionsPythonPlugin) [1]. The CVEs themselves were originally published in GitHub Advisory and NVD in February 2026; the May 7 post added the detailed exploit narrative and CTF.
A single crafted prompt was enough to launch arbitrary code on the host — no malicious attachment, no memory corruption, no browser exploit. This was a vendor PoC, not a confirmed in-the-wild campaign.
The bug pattern is the "lethal trifecta" Simon Willison defined in 2025: private data + untrusted tokens + an exfiltration vector [2].
Patched in semantic-kernel Python ≥ 1.39.4 and .NET SDK ≥ 1.71.0. Microsoft published a CTF challenge to practice the exploit safely.
Pattern repeats across frameworks. Same trifecta drove EchoLeak (Microsoft 365 Copilot, CVE-2025-32711) and the Slack AI exfiltration bug [3][4].

There's a moment in every breach narrative where the defender is sure they're holding pocket aces. AST validator? Built. System prompt? Locked. Guardrails? Stacked. Then the river card flips, and the model that was supposed to be the smart one quietly hands the attacker a shell. CVE-2026-26030 is that hand played out twice — once for Python, once for .NET — and the lesson is the same lesson we wrote up two weeks ago when an authenticated Cacti bug chained into a full Docker Desktop escape: single CVEs are the cards you're dealt; the chain is how you lose.

How the Lethal Trifecta Turns a Lambda Into a Shell

Simon Willison coined the term "lethal trifecta" in June 2025 to describe the structural condition under which any AI agent becomes exploitable: access to private data, exposure to untrusted content, and the ability to communicate externally [2]. CVE-2026-26030 is what happens when a popular framework ships all three by default and adds a fourth ingredient — an eval() sink in the filter path.

Microsoft's writeup is unambiguous about where the fault line sits: the AI model itself isn't the issue. It behaves exactly as designed, parsing language into tool schemas. The vulnerability lives in how the framework and tools trust the parsed data [1]. The model is doing its job. The framework is the one slow-rolling pocket aces.

Semantic Kernel is not a niche project. With over 27,000 stars on GitHub, it provides the abstractions for orchestrating AI models, managing plugins, and chaining workflows [1]. When a foundational layer has a systemic flaw, every downstream agent inherits it — the same dynamic that made the WordPress plugin ecosystem the dominant attack surface in 2025, where core was hardened but the stack on top of it was where the breaches happened.

The vulnerable path

Microsoft's demo built a "hotel finder" agent. The user asks "Find hotels in Paris." The model calls search_hotels(city="Paris"). The plugin runs a deterministic filter before vector similarity. That filter is built like this [1]:

new_filter = f"lambda x: x.{param.name} == '{kwargs[param.name]}'"
eval(new_filter)

The kwargs[param.name] value comes from the model — which means it ultimately comes from the user prompt. Close the quote, append Python, and the lambda body is now whatever you want it to be.

CVE-2026-26030: The AST Bypass Step by Step

Semantic Kernel's authors saw the RCE risk coming and built a validator. Before eval() runs, the filter string is parsed into an AST and checked against a blocklist. It only allows lambda expressions, scans every element for dangerous identifiers and attributes that could enable arbitrary code execution (strings like eval, exec, open, __import__), and executes the resulting lambda in a restricted environment with Python's built-in functions deliberately removed [1].

Sounds reasonable. It's also a textbook example of why blocklists are the wrong primitive in Python.

Conceptually, the bypass looks like this:

lambda x: x.city == ''.__class__.__mro__[1].__subclasses__()[N].__init__.__globals__[...]

That payload never types the words eval, exec, or os. It walks Python's class hierarchy at runtime — through __class__, __mro__, __subclasses__, __init__, __globals__ — until it reaches something dangerous. Every step is a method or attribute access the validator's name-based blocklist never thought to cover.

The Microsoft team's bypass succeeded for four reasons [1]:

Missing dangerous names. The payload reached BuiltinImporter via __name__ traversal and called load_module and system — none on the blocklist.
Structural check passed. The malicious code was wrapped in a valid lambda, so isinstance(tree.body, ast.Lambda) returned True. The body was poison; the wrapper was clean.
Empty __builtins__ was irrelevant. The exploit started with tuple(), which exists regardless of the builtins environment, then crawled the type hierarchy from there.
No ast.Subscript checking. Even if the validator had blocked a name, bracket notation (obj['__class__']) creates a different AST node type the validator ignored entirely.

The result: a prompt that looked like a hotel query traversed Python's class hierarchy at runtime, located BuiltinImporter, dynamically imported os, and called system(). A single prompt was enough to launch calc.exe on the device running the AI agent, with no browser exploit, malicious attachment, or memory corruption bug needed [1].

The fix

Microsoft's mitigation is what the original validator should have been: an allowlist, not a blocklist. The fix uses four layers — an AST node-type allowlist permitting only safe constructs (comparisons, boolean logic, arithmetic, literals); a function call allowlist; a dangerous attributes blocklist that blocks class hierarchy traversal; and a name node restriction that allows only the lambda parameter as a bare identifier [1]. Patched in Python semantic-kernel ≥ 1.39.4.

CVE-2026-25592: An Arbitrary File Write That Leads to RCE

The companion CVE in the same disclosure is structurally different but worth knowing. In the .NET SDK, DownloadFileAsync was accidentally marked with a [KernelFunction] attribute, which officially advertised it to the AI model as a callable tool, complete with its parameter schema [1]. The localFilePath parameter — which controls where File.WriteAllBytes() saves data on the host — was now AI-controlled, with no path validation.

The primitive itself is an arbitrary file write, not RCE directly. Chained with the existing ExecuteCode plugin, it becomes a sandbox escape that leads to full RCE. Step one: prompt the agent to write a malicious script inside the Azure Container Apps sandbox. Step two: prompt the agent to "download" that file via DownloadFileAsync to Windows\Start Menu\Programs\Startup. Step three: wait for the next user sign-in. The container boundary that was supposed to be the security model — gone, in two prompts [1].

This is the same shape as the Cacti-into-Docker-Desktop chain we covered last month: each individual primitive is "medium severity, patch when convenient," but the composition is full host compromise. The bad beat is rarely the zero-day nobody saw coming — it's the two patchable bugs that line up.

The lesson is the same as the first CVE, just on a different layer: any tool parameter the model can influence must be treated as attacker-controlled input. Decorating a function with [KernelFunction] is not metadata — it's an attack surface decision. Patched in Semantic Kernel .NET SDK ≥ 1.71.0 [1].

The model isn't your security boundary. Tape that line to the monitor.

Why This Pattern Keeps Repeating

CVE-2026-26030 is not a one-off. Several major AI security cases over the last twelve months have followed the same structural pattern:

EchoLeak (CVE-2025-32711) — researchers at Aim Labs described a zero-click attack on Microsoft 365 Copilot. A specially crafted email could coerce Copilot into accessing internal files and transmitting their contents to an attacker-controlled server, with no user interaction [3]. Trifecta complete: Copilot had access to internal documents (private data), processed the attacker's email (untrusted content), and rendered Markdown links (exfiltration channel). Microsoft assigned the CVE and patched it.
Slack AI — researchers described how hidden instructions in a Slack message could trick the AI into inserting a malicious link, sending data from a private channel to an attacker's server when a user clicked [4].
Semantic Kernel — now this one. Same pattern, this time the exfiltration vector is RCE itself.

OWASP ranks prompt injection as the #1 threat to LLM applications [5][6], and OpenAI has publicly acknowledged it as a hard, unsolved problem in their own systems [7]. The reason is architectural: LLMs process instructions and data in the same channel. Until the language barrier between trusted and untrusted tokens is solved at the model level — and CaMeL-style systems are still research-grade — the only durable defense is to break the trifecta at the framework boundary.

Why This Matters If You're Not Using Semantic Kernel

Most developers reading this aren't shipping Semantic Kernel agents. They're using LangChain, CrewAI, AutoGen, the OpenAI Agents SDK, or stitching tools together via MCP servers. The framework name doesn't matter. The pattern does.

Every one of those frameworks builds on the same architecture: natural language input → tool schema selection → parameters passed into code. If any tool parameter ends up in a string interpolation, an eval-like construct, a subprocess call, a file path, or a SQL query without sanitization, the same class of vulnerability exists. Microsoft has explicitly stated this is part of an ongoing research series and that similar issues exist in frameworks beyond their ecosystem [1]. Expect more CVEs in this shape over the next twelve months.

If you maintain an agent stack, the audit question is simple: list every tool you expose, and for each one, ask where the model-controlled parameters end up. Anything that touches an interpreter, a shell, a filesystem, or a database is a potential CVE-2026-26030 in disguise.

Try It Yourself: Microsoft's CTF Release

Microsoft published the vulnerable hotel-finder as a self-contained CTF challenge alongside the disclosure. The challenge requires crafting a prompt injection that bypasses the agent's natural language defenses and smuggles a Python AST-traversal payload through the vulnerable eval() sink [1]. The repo: github.com/amiteliahu/AIAgentCTF/tree/main/CVE-2026-26030.

This is a rare gift from a vendor — a real-world CVE packaged for hands-on practice. If you've been working through HTB's AI tracks or the Cyber Apocalypse 2025 prompt injection challenges, this is the natural next step: same skillset, but the target is a recently patched CVE rather than a contrived flag chamber.

For ongoing practice, HackTheBox and TryHackMe have both expanded their AI/LLM tracks significantly through 2025–2026; the techniques transfer directly. The MITRE ATLAS framework's AML.T0051 (LLM Prompt Injection) is the right taxonomy entry to map findings against in writeups [8].

Affected Versions

CVE-2026-26030 — Python semantic-kernel < 1.39.4. Patched in 1.39.4. Trigger condition: agent uses In-Memory Vector Store with default filter functionality [1].
CVE-2026-25592 — Semantic Kernel .NET SDK < 1.71.0. Patched in 1.71.0. Trigger condition: agent uses SessionsPythonPlugin [1].

Key Takeaways

Patch immediately. Upgrade semantic-kernel Python to ≥ 1.39.4 and .NET SDK to ≥ 1.71.0. No architectural rewrite needed.
Hunt retrospectively. Microsoft published Defender advanced hunting queries to detect post-exploitation child processes (cmd.exe, powershell.exe, etc.) spawned from Semantic Kernel hosts during the vulnerable window [1].
Treat tool parameters as untrusted input. Any value the model can influence is attacker-controlled. Path validation, parameter sanitization, and allowlists belong at the tool boundary, not the prompt boundary.
Allowlists beat blocklists. Python's flexibility means restricted operations can almost always be reintroduced through alternate syntax. If you must validate an AST, allow specific node types — don't blocklist names.
Audit the trifecta. If your agent has private data access + untrusted token exposure + an outbound channel (tool calls, links, file writes), it is structurally vulnerable. Break at least one leg. Defense-in-depth wins the same way it wins for container escapes: patch the primitive, then alert on the post-exploitation behaviour.
The model isn't your security boundary. Guardrails reduce attack surface; they don't eliminate it.

FAQ

Is CVE-2026-26030 being exploited in the wild?

Microsoft has not confirmed in-the-wild exploitation. The PoC was developed by Microsoft's own research team. They recommend treating any suspicious activity from Semantic Kernel hosts during the vulnerable window as potential compromise [1]. Hunt with the published KQL queries.

My agent uses LangChain or CrewAI, not Semantic Kernel. Am I safe?

No. Microsoft has stated this is part of a research series and similar vulnerabilities exist in frameworks beyond the Microsoft ecosystem [1]. The lethal trifecta is structural, not vendor-specific.

Can this happen through MCP servers?

Yes, structurally. MCP is a protocol for exposing tools to LLMs, and any MCP-exposed tool whose parameters end up in a sensitive sink (eval, subprocess, file path, SQL) carries the same risk pattern. The Model Context Protocol expands the attack surface rather than reducing it, because untrusted MCP server outputs become part of the model's context. Audit MCP-connected tools the same way you'd audit Semantic Kernel plugins.

Can a system prompt or input filter prevent this kind of attack?

Not reliably. Industry research and Microsoft's own classifiers (XPIA) have been bypassed in production. The defense lives at the tool boundary, not the prompt boundary.

What's the difference between prompt injection and jailbreaking?

Willison, who coined the term, is explicit: jailbreaking attacks involve attackers directly tricking an LLM into doing something embarrassing — those are a different issue than prompt injection, which is about mixing trusted and untrusted content in the same context [2]. CVE-2026-26030 is prompt injection. The model wasn't tricked; the framework was.

Where should I practice this skillset?

Microsoft's CVE-2026-26030 CTF repo is the highest-signal target right now. Beyond that: HackTheBox AI tracks, TryHackMe LLM rooms, alexdevassy's Machine_Learning_CTF_Challenges repo (covering OWASP LLM Top 10 mappings) [9], and the Promptfoo red-teaming framework for testing your own deployments [10].

Sources

Microsoft Security Blog — "When prompts become shells: RCE vulnerabilities in AI agent frameworks" (May 2026)
Simon Willison — "The lethal trifecta for AI agents: private data, untrusted content, and external communication" (June 2025)
arXiv — "EchoLeak: A Zero-Click Prompt Injection Exploit in Microsoft 365 Copilot" (2025)
Sombra Inc. — "LLM Security Risks in 2026: Prompt Injection, RAG, and Shadow AI" (February 2026)
OWASP — "OWASP Top 10 for Large Language Model Applications"
Kunal Ganglani — "Prompt Injection in 2026: Still OWASP's Number One LLM Vulnerability" (March 2026)
Airia — "AI Security in 2026: Prompt Injection, the Lethal Trifecta, and How to Defend" (January 2026)
MITRE ATLAS — "AML.T0051 — LLM Prompt Injection"
Alex Devassy — "Machine_Learning_CTF_Challenges" (GitHub)
Promptfoo — "Testing AI's 'Lethal Trifecta' with Promptfoo" (September 2025)