Prompt Injection — The SQL Injection of LLMs
Prompt injection isn't a hypothetical — it's the #1 vulnerability in OWASP's Top 10 for LLM Applications, and the production attack pattern we see most often when customers ask us to red-team their agentic systems. There is no silver-bullet defence. There is a layered architecture that contains the damage. Here's the threat model + mitigations.
Last quarter we were asked to red-team a customer's agentic procurement workflow — an internal agent that helps employees submit purchase requests against vendor catalogues. Within 40 minutes of starting the engagement, the red team had three working injection paths, two of which would have let an unprivileged employee approve their own purchase requests above their spend authority by hiding instructions in a vendor product description. None of the three exploits required sophisticated technique. All three would have survived the customer's existing test suite, which focused on functional correctness rather than adversarial behaviour. That assignment was at a Fortune 500 manufacturer; it could have been any of the 23 enterprises we've red-teamed agentic systems for in the past 14 months. Prompt injection is the LLM equivalent of SQL injection — a vulnerability where untrusted input crosses a boundary the system treats as trusted, and gets interpreted as instructions. The OWASP Top 10 for LLM Applications ranks it #1 for a reason: it's the production attack pattern we see most often. The structural problem is that LLMs treat all text in their context as instructions; there's no syntactic distinction between "these are the rules" and "this is the user's data." There is no silver-bullet defence. There is a layered architecture that contains the damage. Here is how we deploy it.
The three vectors of prompt injection
Direct injection: a user types adversarial instructions directly into the chat interface. The attacker's text becomes part of the prompt and competes with the system instructions for the model's attention. Indirect injection: the model retrieves a document, web page, or email that contains adversarial instructions hidden in its content. The retrieved content gets concatenated into the prompt and the model treats it as instructions. Indirect is more dangerous because the user is unaware of the attack vector — the attacker is the author of the document the user asked the assistant to summarise. Multi-modal injection: instructions hidden in images, audio, or other non-text modalities that the model processes. Increasingly relevant as enterprise deployments use multi-modal models.
Defence layer one: input filtering at the boundary
First line of defence: every input that enters the LLM prompt is inspected by a guardrails layer (we deploy NeMo Guardrails or LlamaGuard self-hosted) before it reaches the model. The guardrails check for known injection patterns: explicit "ignore previous instructions" attempts, role-confusion attempts ("you are now a different AI"), instruction-conflict markers, suspicious whitespace or unicode manipulation. The guardrails won't catch everything — adversarial prompting is an arms race — but they catch the obvious 70–80% of injection attempts, and they create an audit log of attempts that informs threat-modelling for the application.
Defence layer two: structural separation in the prompt
When the system prompt and the user input share the same flat text context, the model has no way to know which is which. The mitigation is structural prompt design: use delimiter tokens that the model has been trained to recognise, like XML-style tags or distinct markdown sections, to separate trusted instructions from untrusted user content. Explicit instructions in the system prompt that user content inside the delimiters must not be interpreted as new system instructions. This doesn't make injection impossible — a sufficiently clever attacker can still escape the delimiters — but it raises the bar substantially and makes the model's default behaviour resistant to common attacks.
Defence layer three: bounded autonomy at the architecture level
The most important defence is structural: don't put highly-privileged tool calls behind a free-form prompt. If the agent can call a function that deletes records, transfers money, or modifies system state, that function should require a permission check at the function level that's independent of the LLM's decision-making. The agent runtime maintains the permission model; the LLM's instruction to call the function gets executed only if the user identity has the right to invoke that function with those arguments. A successful prompt injection can convince the model to ATTEMPT a privileged action, but the runtime refuses to execute it. This is Article 14 human oversight made architectural.
Defence layer four: output filtering before sensitive use
The flip side of input filtering: every output that leaves the LLM before it reaches a sensitive sink (a customer-facing channel, a downstream API, a privileged action) goes through output guardrails. The output guardrails check for: PII the system shouldn't be emitting, credentials the system shouldn't be leaking, policy violations, unsupported claims with confidence scores. The output filter catches the cases where the input filter missed an injection and the model started producing something it shouldn't. The double-layer (input + output) is materially harder to defeat than either layer alone.
Defence layer five: monitoring and detection at runtime
Even with all the above, injection attempts will get through. The detection layer monitors the agent's runtime behaviour for anomalous patterns: sudden changes in tool-call distributions, unusual prompt sequences, attempts to access resources outside the user's normal pattern. The Langfuse audit log feeds into a security monitoring layer (typically the customer's SIEM) that alerts on anomalies. The detection layer is what catches sophisticated injection that defeats the upstream defences, and what produces the evidence needed for forensic review when an incident is suspected. /agentic-ai documents the full security architecture; see /glossary#prompt-injection for the underlying concept.
What to put in your threat model
Three injection scenarios every enterprise AI threat model should include. One: a malicious customer trying to extract the system prompt or change the agent's behaviour. Mitigation: input filtering + structural prompt design. Two: a malicious document in the retrieval corpus (uploaded by a compromised user or a supply-chain attack on document sources) trying to make the agent take action the customer wouldn't sanction. Mitigation: bounded autonomy + output filtering. Three: a malicious instruction in a user-pasted document the user asked the agent to summarise. Mitigation: structural separation + the user-experience pattern of showing the user what action the agent is about to take before executing irreversible ones.
MindMap Engineering
MindMap Engineering is the collective practice behind 117 production-deployed AI accelerators across BFSI, healthcare, government, retail and telecom. The pieces published here are written by the engineering leads who shipped the systems they describe — sovereign LLM platforms, RAG pipelines, agentic workflows, IDP systems — at customer sites across three continents. We don't write about architectures we haven't deployed.
- ✓117 production-deployed AI accelerators
- ✓50+ enterprise customers across BFSI, healthcare, government
- ✓Deployments live across India, UK, EU, Gulf, North America, Africa
- ✓Sovereign deployment as the default architectural pattern
- ✓Langfuse + RAGAS + vLLM + Qdrant production experience
Keep reading
The 2026 Sovereign AI Architecture Report
Data-driven analysis of every meaningful sovereign AI stack in production today. Compares 6 open-weights model families, 4 vector databases, 3 inference servers and 5 reference architectures on cost-per-million-tokens, regulator-readiness, integration substrate and operational complexity. Survey-based, with the deployment numbers from 50+ regulated-industry engagements behind every recommendation.
State of Agentic AI in Regulated Industries 2026
A production-pattern survey of agentic AI in BFSI, healthcare, public sector and pharma. What patterns actually ship (ReAct + tool-use, planner-executor, multi-agent orchestration), what fails in audit (silent loops, hidden tool calls, unbounded reasoning), and the four engineering controls separating prototypes from production. Based on the agent runtimes we've shipped at 17 regulated customers in the past 18 months.
EU AI Act Readiness Benchmark — 50 Enterprises
Anonymised readiness benchmark across 50 enterprises with EU exposure — banks, insurers, hospitals, manufacturers, public-sector bodies — measured against the 11 Articles 9–15 evidence requirements. Median readiness is 38%; only 14% would survive a supervisory audit today. Where the gaps cluster, why they're tractable in 90 days, and the five interventions that close the most ground.
Ready to apply these ideas?
Talk to our engineering team. No sales pitch — just a technical conversation.
Start a conversation →