defense 2025

Better Privilege Separation for Agents by Restricting Data Types

Dennis Jacob 1, Emad Alghamdi 1,2, Zhanhao Hu 1, Basel Alomair 3, David Wagner 1

1 citations · 39 references · arXiv

α

Published on arXiv

2509.25926

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Type-directed privilege separation systematically prevents prompt injection attacks in LLM agents across multiple case studies while maintaining high task utility, and remains robust to adaptive attacks that defeat prior detector- and fine-tuning-based approaches.

Type-directed privilege separation

Novel technique introduced


Large language models (LLMs) have become increasingly popular due to their ability to interact with unstructured content. As such, LLMs are now a key driver behind the automation of language processing systems, such as AI agents. Unfortunately, these advantages have come with a vulnerability to prompt injections, an attack where an adversary subverts the LLM's intended functionality with an injected task. Past approaches have proposed detectors and finetuning to provide robustness, but these techniques are vulnerable to adaptive attacks or cannot be used with state-of-the-art models. To this end we propose type-directed privilege separation for LLMs, a method that systematically prevents prompt injections. We restrict the ability of an LLM to interact with third-party data by converting untrusted content to a curated set of data types; unlike raw strings, each data type is limited in scope and content, eliminating the possibility for prompt injections. We evaluate our method across several case studies and find that designs leveraging our principles can systematically prevent prompt injection attacks while maintaining high utility.


Key Contributions

  • Type-directed privilege separation: converts untrusted content into a curated set of restricted data types, each limited in scope, eliminating the prompt injection attack surface
  • Systematic (not probabilistic) prevention of prompt injection that does not require fine-tuning or access to model internals, making it compatible with state-of-the-art proprietary LLMs
  • Case study evaluation demonstrating high utility preservation alongside prompt injection prevention across multiple LLM agent designs

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Applications
ai agentslanguage processing automation systems