defense 2025

Better Privilege Separation for Agents by Restricting Data Types

Dennis Jacob ¹, Emad Alghamdi ^1,2, Zhanhao Hu ¹, Basel Alomair ³, David Wagner ¹

¹ University of California, Berkeley

² HUMAIN

³ KACST

1 citations · 39 references · arXiv

Published on arXiv

2509.25926

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Type-directed privilege separation systematically prevents prompt injection attacks in LLM agents across multiple case studies while maintaining high task utility, and remains robust to adaptive attacks that defeat prior detector- and fine-tuning-based approaches.

Type-directed privilege separation

Novel technique introduced

Large language models (LLMs) have become increasingly popular due to their ability to interact with unstructured content. As such, LLMs are now a key driver behind the automation of language processing systems, such as AI agents. Unfortunately, these advantages have come with a vulnerability to prompt injections, an attack where an adversary subverts the LLM's intended functionality with an injected task. Past approaches have proposed detectors and finetuning to provide robustness, but these techniques are vulnerable to adaptive attacks or cannot be used with state-of-the-art models. To this end we propose type-directed privilege separation for LLMs, a method that systematically prevents prompt injections. We restrict the ability of an LLM to interact with third-party data by converting untrusted content to a curated set of data types; unlike raw strings, each data type is limited in scope and content, eliminating the possibility for prompt injections. We evaluate our method across several case studies and find that designs leveraging our principles can systematically prevent prompt injection attacks while maintaining high utility.

Key Contributions

Type-directed privilege separation: converts untrusted content into a curated set of restricted data types, each limited in scope, eliminating the prompt injection attack surface
Systematic (not probabilistic) prevention of prompt injection that does not require fine-tuning or access to model internals, making it compatible with state-of-the-art proprietary LLMs
Case study evaluation demonstrating high utility preservation alongside prompt injection prevention across multiple LLM agent designs

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Applications

ai agentslanguage processing automation systems

Read PDF arXiv DOI

Better Privilege Separation for Agents by Restricting Data Types

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

The Cost of Thinking: Increased Jailbreak Risk in Large Language Models

Soft Instruction De-escalation Defense

RvB: Automating AI System Hardening via Iterative Red-Blue Games

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts

Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models

BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI