How to audit and harden your LLM agent stack against prompt injection and tool-call exploits

Three CVEs hit LangChain and LangGraph this week. CVE-2026-34070 (CVSS 7.5) is a path traversal in the prompt-loading API that exposes arbitrary filesystem files. CVE-2025-68664 (CVSS 9.3) is a serialization injection that leaks API keys and environment secrets. CVE-2025-67644 (CVSS 7.3) is an SQL injection in LangGraph's SQLite checkpoint that exposes conversation history. Meanwhile, a separate Langflow RCE flaw (CVE-2026-33017) was exploited in the wild within 20 hours of disclosure, with CISA adding it to the Known Exploited Vulnerabilities catalog.

These aren't exotic attacks. They're path traversals, deserialization bugs, and SQL injections that have existed in web security for decades, now showing up in AI framework plumbing that gets 52 million weekly PyPI downloads.

If you're running agentic LLM systems in production, this guide is the security audit you should have done last quarter. No preamble on what LLMs are. We're going straight to attack surfaces, audit checklists, defense patterns, and the tradeoffs nobody talks about.

The actual attack surface

Most security reviews of LLM pipelines focus on prompt injection and stop there. The real attack surface is wider. Here are the vectors that matter, organized by where they hit your stack.

Prompt injection (direct and indirect). An attacker crafts input that overrides your system prompt. Direct injection comes through user-facing inputs. Indirect injection comes through data the agent retrieves: web pages, emails, database records, API responses. OWASP ranks this LLM01 in their Top 10 for LLM Applications for a reason. It is the most common and least mitigated vector.

Tool-call exploitation. Your agent has tools. Those tools have parameters. If the LLM can be convinced to call a tool with attacker-controlled arguments, you have a classic injection-through-indirection problem. The LangChain CVEs are textbook examples: the prompt-loading API (CVE-2026-34070) accepted arbitrary file paths because nobody validated that a prompt template path shouldn't point to /etc/passwd. The serialization bug (CVE-2025-68664) worked because the framework trusted LLM-influenced metadata as already-serialized objects.

Context poisoning. Agents with persistent memory or RAG pipelines are vulnerable to poisoned context. An attacker plants malicious instructions in documents that will later be retrieved and injected into the agent's context window. MITRE ATLAS v5.4.0 (released February 2026) added "Publish Poisoned AI Agent Tool" as a formal technique, recognizing that tool registries and shared knowledge bases are attack vectors.

Output smuggling and exfiltration. The agent reads sensitive data, then an attacker uses prompt injection to make it write that data somewhere accessible: a tool call that sends an email, an API request with secrets encoded in parameters, or a response that contains hidden data. Cyera researcher Vladimir Tokarev demonstrated that the three LangChain/LangGraph flaws each expose a different class of enterprise data: filesystem files, environment secrets, and conversation history.

Escape to host. MITRE ATLAS v5.4.0 also added "Escape to Host" as a technique. If your agent runs tools via code execution (Python exec, shell commands, container calls), insufficient sandboxing lets an attacker pivot from LLM context to operating system access. The Langflow CVE-2026-33017 is the poster child: arbitrary Python execution via a single crafted HTTP request because flow execution wasn't sandboxed.

Audit checklist for an existing pipeline

Run through this on any agentic system currently in production. If you can't answer "yes" to a question, you've found work to do.

1. Input boundary mapping.

Can you enumerate every point where external data enters the agent's context? This includes user inputs, RAG retrieval results, tool outputs, API responses, and persistent memory reads.
Is each input channel treated as untrusted? (If your RAG results go straight into the prompt without sanitization, they're not.)

2. Tool permission audit.

List every tool your agent can call. For each: what's the worst-case outcome if called with attacker-controlled arguments?
Do tools have parameter validation? Not "does the LLM usually pass reasonable values," but "does the code reject malformed inputs before execution?"
Are write-capable tools (email, database writes, file creation, external API calls) gated behind confirmation or scoped to specific operations?

3. Serialization and deserialization review.

Does your framework deserialize data that the LLM or user can influence? The LangChain CVE-2025-68664 existed because loads() treated user-supplied dicts containing lc keys as pre-serialized LangChain objects, enabling secret extraction through secrets_from_env.
Audit every load(), loads(), pickle, yaml.load(), or equivalent call in your pipeline. If any accept data that touches the LLM's input or output path, you have a vulnerability.

4. Data flow tracing.

Trace the path of every secret (API keys, database credentials, auth tokens) in your deployment. Can any of those secrets appear in the LLM's context window?
If environment variables are accessible to the agent runtime, can the LLM be instructed to read and output them?

5. Execution sandboxing.

If your agent executes code, where does it run? Same process? Same container? Same machine?
Could a malicious tool call escalate to OS-level access? The answer is yes if you're running exec() or eval() on LLM-generated code without a sandbox.

6. Output validation.

Do you inspect what the agent returns before it reaches the user or downstream systems?
Can the agent's output contain hidden instructions that chain to another system (output smuggling)?

7. Checkpoint and memory integrity.

If you use persistent checkpoints (like LangGraph's SQLite store), is the checkpoint store hardened against injection? CVE-2025-67644 showed that metadata filter keys in LangGraph's SQLite implementation were vulnerable to SQL injection.
Are conversation histories access-controlled, or can one user's thread leak into another's context?

Defense patterns with tradeoff analysis

There's no free lunch. Every defense pattern costs something. Here's what actually works, what it costs, and when to pick each approach.

Input sanitization and prompt armoring. Strip or escape control characters and known injection patterns from all untrusted inputs before they enter the context window. Some teams use a separate classifier model to detect injection attempts.

The tradeoff: regex-based sanitization is fast (sub-millisecond) but brittle and trivially bypassed with encoding tricks. Classifier-based detection adds 50-200ms latency per call and requires ongoing maintenance as attack patterns evolve. Neither approach is complete on its own.

Tool-call validation and least privilege. Every tool call should be validated against a schema before execution. Parameters should be type-checked, range-checked, and path-validated. Tools should operate with minimal permissions: a file-reading tool should be restricted to a specific directory, not the entire filesystem.

The tradeoff: strict schemas reduce flexibility. If your agent needs to handle novel requests, overly rigid validation creates false rejections. The right balance is usually a strict allowlist for high-risk tools (file access, code execution, external API calls) and looser validation for low-risk tools (search, formatting).

Execution sandboxing. Run code execution tools in isolated environments: containers, VMs, or serverless functions with no access to the host filesystem, network, or secrets.

The tradeoff: cold-start latency for sandboxed execution ranges from 100ms (warm containers) to 2-5 seconds (new VM spin-up). For interactive agents, that latency is noticeable. Warm pools help but add infrastructure cost. The Langflow RCE (CVE-2026-33017) is what happens when you skip this: unsandboxed exec() calls led to full remote code execution.

Output filtering and data loss prevention. Scan agent outputs for sensitive patterns (API keys, credentials, PII) before they reach the user. This is your last line of defense against exfiltration.

The tradeoff: pattern-based DLP catches known formats (AWS keys, JWT tokens) but misses encoded or obfuscated secrets. Full content inspection adds latency. Most teams settle for regex-based scanning of outputs, which catches the obvious cases and misses targeted attacks.

Privilege separation between planning and execution. Use a two-phase architecture: one LLM call plans the action, a separate validation layer approves or rejects it, then a constrained executor runs it. The planner never has direct tool access.

The tradeoff: doubles your LLM inference cost and adds a round-trip of latency. But it's the single most effective pattern against tool-call exploitation because the executor only accepts pre-validated action specifications, not freeform LLM output.

Red teaming and continuous testing. Adversarial testing isn't a one-time exercise. Tools like Novee's AI Red Teaming agent (launched at RSAC 2026) and Cisco's Dynamic Agent Red Teaming framework now offer automated, multi-turn adversarial testing for agentic workflows. Integrate these into CI/CD.

The tradeoff: automated red teaming catches known attack patterns well. Novel attacks still require human red teamers. Budget for both.

When NOT to do this

Over-engineering security is a real failure mode. Here's where teams waste effort.

Don't sandbox everything if your agent only reads data. If your agent has no write tools, no code execution, and no external API access, the blast radius of a successful prompt injection is limited to the agent saying something wrong. That's a quality problem, not a security emergency. Focus your sandboxing budget on agents that take actions.

Don't build a custom prompt injection classifier if you're a team of three. Training and maintaining a classifier is a full-time job. Use an off-the-shelf solution or, better yet, focus on architectural defenses (least privilege, input/output boundaries) that don't require ongoing ML ops.

Don't add a human-in-the-loop for every tool call. That defeats the purpose of having an agent. Reserve human approval for irreversible, high-impact actions (sending money, deleting data, external communications). Let low-risk, reversible actions flow through automated validation.

Don't treat framework updates as optional. The LangChain patches for these CVEs landed in langchain-core 1.2.22 (for CVE-2026-34070), langchain-core 1.2.5 (for CVE-2025-68664), and langgraph-checkpoint-sqlite 3.0.1 (for CVE-2025-67644). If you're behind on framework versions, that is your first security task. Not this guide.

Decision framework

When choosing between defense approaches, ask these questions in order:

What can this agent actually do? Map every tool to its worst-case impact. Agents that can only read and respond need different security than agents that execute code or call external APIs. A Cisco enterprise survey found that 85 percent of enterprises have experimented with AI agents but only 5 percent have moved them to production. The gap is largely about trust in what agents can do unsupervised.

Where does untrusted data enter the pipeline? If the answer is "only through the user input field," your attack surface is narrow. If the answer includes RAG retrieval, external APIs, shared memory, or tool outputs that feed back into the context, you need defense-in-depth.

What's your latency budget? If you're building a real-time conversational agent, you can't afford 500ms of security overhead per turn. Pick architectural defenses (least privilege, sandboxing, schema validation) over runtime inspection. If you're running batch workflows, throw the full stack at it.

What's reversible? Prioritize hardening irreversible actions. A wrong search result is annoying; a leaked API key is a breach. Allocate security investment proportional to the blast radius of each tool.

How fast do you ship framework updates? If the answer is "quarterly," you're carrying months of unpatched CVEs. The Langflow RCE was weaponized in 20 hours. Your patching cadence is a security control whether you think of it that way or not.

The LLM security landscape is moving fast. MITRE ATLAS added agentic-specific attack techniques in its February 2026 v5.4.0 update. OWASP maintains an LLM-specific Top 10. These frameworks exist. Use them as your starting taxonomy, then map your specific pipeline against them.

As Cisco's Jeff Schultz put it at RSAC 2026: "With chatbots, we worried about what they would say. With agents, we worry about what they do." That worry is well-calibrated. Act on it.

Nate Hargrove covers advanced engineering and AI security for The Daily Vibe.

How to audit and harden your LLM agent stack against prompt injection and tool-call exploits

The actual attack surface

Audit checklist for an existing pipeline

Defense patterns with tradeoff analysis

When NOT to do this

Decision framework

Related Articles

Cisco lost 300+ repos to the supply chain attack we warned about three days ago

Your organization has no AI model selection process. Here is how to build one in 30 days.

How to choose between Claude, ChatGPT, and Gemini in 2026