AI Agent Security • May 9, 2026 • 8 min read

Semantic Kernel RCE shows why AI agent tool inputs need real boundaries

Microsoft's Semantic Kernel RCE research shows how prompt injection becomes code execution when agents can influence tool parameters. Here's how to reduce the blast radius.

🦞

OpenClaw Team

Microsoft’s latest Semantic Kernel research makes the AI agent security problem concrete: prompt injection is dangerous because agents turn language into tool calls. Once a model can influence file paths, filters, shell commands, or plugin parameters, the prompt is no longer “just text.” It is untrusted input sitting next to real execution.

That is the right mental model for anyone running a personal or self-hosted AI agent. The safest setup is not the one with the cleverest prompt. It is the one where every tool has a narrow permission boundary, observable behavior, and a small blast radius.

What Microsoft found

On May 7, Microsoft published research on two Semantic Kernel vulnerabilities that were triggered through AI-controlled tool inputs:

CVE-2026-26030 affected the Python In-Memory Vector Store filter functionality. Microsoft describes it as a path from prompt injection to remote code execution when an agent used the Search Plugin with the vulnerable default configuration.
CVE-2026-25592 affected the .NET SessionsPythonPlugin. The issue allowed arbitrary host file writes that could lead to sandbox escape and remote code execution.

Both issues were fixed, so this is not a reason to panic about Semantic Kernel specifically. The larger lesson matters more: frameworks that connect models to tools inherit traditional software security problems, but the attacker may enter through natural language instead of a web form.

Microsoft’s blunt version is worth keeping: “Your LLM is not a security boundary. The tools you expose define your attacker’s affected scope.”

Why prompt injection becomes execution

A normal prompt injection tells a model to ignore instructions, leak data, or make a bad decision. That is already a problem.

An agent framework changes the stakes. The model is not only producing text. It is choosing tools and filling in arguments:

Agent capability	What the model may influence	What can go wrong
Search or retrieval	Filters, query strings, metadata fields	Injection into parser logic, unsafe evaluation, data leakage
File operations	Paths, filenames, content	Arbitrary file write, overwrite, path traversal
Code execution	Commands, arguments, scripts	Shell injection or execution outside intended scope
SaaS/API tools	Record IDs, actions, payloads	State changes, data deletion, unauthorized updates

The weak point is not that the model is “evil.” The weak point is that the model is asked to convert messy human text into structured actions, and downstream tools may treat those actions as trusted.

That is an old web security lesson wearing new clothes. User-controlled input should not flow into eval(), shell commands, SQL, filesystem paths, or privileged APIs without validation. Agent frameworks make that flow easier to create by accident.

The same pattern is showing up around MCP and skills

Microsoft’s post landed during a noisy month for agent security. GitHub added secret scanning and dependency scanning for its MCP Server, because AI coding agents increasingly operate inside repositories and development environments. Help Net Security summarized Noma Security research arguing that enterprises are governing visible MCP calls while missing the softer risk of skills that alter model reasoning. VentureBeat covered OX Security’s claim that MCP STDIO defaults create command execution exposure when implementers do not add their own boundaries.

Those stories are not identical. Some involve framework bugs. Some involve risky defaults. Some involve third-party skills. But they rhyme:

Agents get connected to more tools.
Tools receive arguments shaped by model output.
The agent inherits credentials, filesystem access, network reach, or workspace state.
A prompt, document, web page, issue, email, or skill nudges the agent toward the wrong action.

This is why personal AI agents need operational security, not just model safety. If you are building with OpenClaw, start with how OpenClaw works and treat every integration as a permission decision. The convenience of a connected agent is real. So is the risk of giving it a loaded keyring.

What this means for OpenClaw users

OpenClaw’s useful property is control. You choose where the assistant runs, which channels it listens to, which skills it loads, and which credentials it can reach. That does not make it automatically safe. It gives you the place to make safety decisions deliberately.

A safer self-hosted OpenClaw setup follows four rules.

1. Treat tool parameters as attacker-controlled

If an agent reads web pages, GitHub issues, Slack messages, email, PDFs, or community content, assume hostile text can influence the next tool call. Validate tool inputs the same way you would validate HTTP request data.

Practical examples:

Allowlist expected actions instead of accepting arbitrary command strings.
Restrict file tools to specific directories.
Reject paths containing traversal patterns or unexpected absolute paths.
Avoid tool designs where the model can pass raw code into an interpreter.
Prefer structured parameters with tight schemas over free-form command blobs.

This is especially important for custom skills. If you are creating one, use the patterns in how to create a custom OpenClaw skill but add a security review step before giving the skill write access, shell access, or external API credentials.

2. Separate read tools from write tools

Read-only tools are not risk-free, but they are easier to reason about. A tool that fetches a calendar event has a different blast radius from a tool that deletes events, sends messages, or transfers money.

For personal agents, the cleanest split is:

Give broad read access only where the data is not highly sensitive.
Give write access to a smaller set of actions.
Require confirmation for irreversible actions.
Use separate credentials for automation instead of reusing your primary admin tokens.

This matters more than prompt wording. A prompt that says “never delete files without asking” is useful. A filesystem permission boundary that prevents deletion outside one workspace is better.

3. Pin and inspect the agent ecosystem you install

The skill ecosystem is growing fast. ClawHub already lists tens of thousands of skills, including security-focused vetting and agent hardening skills. That is good for capability discovery, but it also means installation becomes a supply-chain decision.

Before installing a skill or connecting an MCP server, check:

Does it request shell, filesystem, network, browser, or credential access?
Does it fetch code from untrusted domains at runtime?
Does it ask to read broad home-directory paths like ~/.ssh, .env, or browser profiles?
Does it use dynamic execution patterns such as eval, raw shell interpolation, or unpinned remote scripts?
Is the source maintained and reviewed, or is it a one-off upload with unclear provenance?

For a user-friendly entry point, see top OpenClaw skills for beginners. For security-sensitive work, boring is good: fewer skills, narrower scopes, pinned versions, and explicit review.

4. Log actions, not just conversations

Conversation logs tell you what the model said. Security investigations need to know what the agent did.

At minimum, keep records of:

Which tools were invoked.
What arguments were passed.
Which files, APIs, or external systems were touched.
Whether the action was read-only or state-changing.
Whether a human approval happened before execution.

This is the difference between “the agent behaved strangely” and “the agent called delete_event with event ID X after reading message Y.” The second version can be debugged.

OpenClaw users who care about this should read the OpenClaw security guide and is OpenClaw safe? before adding high-impact integrations.

A quick hardening checklist

Use this before connecting a new tool, MCP server, or skill to a personal agent:

Define the job. What exact task does this tool need to perform?
Remove unused power. If it only needs read access, do not give write access.
Narrow credentials. Create a dedicated token with limited scope.
Validate inputs. Treat model-supplied arguments as untrusted.
Avoid raw execution. Do not let the model pass arbitrary shell, Python, SQL, or JavaScript unless the environment is intentionally sandboxed.
Add confirmation. Require approval for sends, deletes, purchases, deployments, and public posts.
Log the action. Record the tool call and result in a way you can review later.
Review periodically. Remove stale skills and integrations you no longer use.

The point is not to make agents harmless. A harmless agent is usually useless. The point is to make power legible and bounded.

The bottom line

The Semantic Kernel RCE research is a reminder that agent security lives below the prompt. You can improve system instructions, but you cannot prompt your way out of unsafe tool design.

For OpenClaw, the practical advantage is ownership. A self-hosted assistant gives you a real place to inspect skills, limit credentials, isolate risky actions, and decide which workflows deserve autonomy. Use that control. If a tool can touch your files, accounts, or money, design it like an attacker will eventually influence one of its inputs.

Sources: Microsoft Security Blog, The New Stack on GitHub MCP security scanning, Help Net Security on MCP and Skills blind spots, VentureBeat on MCP STDIO command execution exposure, ClawHub skills directory

Stop reading about it. Run it.

OpenClaw Cloud is the fastest way to get an AI agent that actually does things — from WhatsApp, Telegram, or any chat app. 24/7. From $19.9/mo with a 3-day money-back guarantee.

Try OpenClaw Cloud → Self-Host Free

Get Started with OpenClaw

Let OpenClaw handle your inbox, calendar, and daily tasks — from any chat app you already use.

Try OpenClaw Cloud Learn More