defense 2026

An AI Agent Execution Environment to Safeguard User Data

Robert Stanley ¹, Avi Verma ¹, Lillian Tsai ², Konstantinos Kallas ¹, Sam Kumar ¹

¹ University of California, Los Angeles

² Google

0 citations

Published on arXiv

2604.19657

Prompt Injection

OWASP LLM Top 10 — LLM01

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Blocks all data disclosure attacks including prompt injection exfiltration attempts that bypass other state-of-the-art systems, without significant impact on agent utility

GAAP

Novel technique introduced

AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user data. Furthermore, sharing private data with an AI agent requires users to trust a potentially unscrupulous or compromised AI model provider with their private data. This paper presents GAAP (Guaranteed Accounting for Agent Privacy), an execution environment for AI agents that guarantees confidentiality for private user data. Through dynamic and directed user prompts, GAAP collects permission specifications from users describing how their private data may be shared, and GAAP enforces that the agent's disclosures of private user data, including disclosures to the AI model and its provider, comply with these specifications. Crucially, GAAP provides this guarantee deterministically, without trusting the agent with private user data, and without requiring any AI model or the user prompt to be free of attacks. GAAP enforces the user's permission specification by tracking how the AI agent accesses and uses private user data. It augments Information Flow Control with novel persistent data stores and annotations that enable it to track the flow of private information both across execution steps within a single task, and also over multiple tasks separated in time. Our evaluation confirms that GAAP blocks all data disclosure attacks, including those that make other state-of-the-art systems disclose private user data to untrusted parties, without a significant impact on agent utility.

Key Contributions

GAAP execution environment that enforces user privacy specifications deterministically without trusting the AI agent or model provider
Novel persistent information flow control mechanism tracking private data across multiple agent tasks separated in time
Guaranteed protection against data exfiltration attacks including prompt injection, with evaluation showing 100% attack blocking rate

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Applications

ai agentspersonal assistantsmulti-step task automation

Read PDF arXiv

An AI Agent Execution Environment to Safeguard User Data

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Mitigating the OWASP Top 10 For Large Language Models Applications using Intelligent Agents

AgentWatcher: A Rule-based Prompt Injection Monitor

SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration

Building Browser Agents: Architecture, Security, and Practical Solutions

AI Kill Switch for malicious web-based LLM agent