defense 2026

Parallax: Why AI Agents That Think Must Never Act

Joel Fokou

0 citations

α

Published on arXiv

2604.12986

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Blocks 98.9% of 280 adversarial test cases across 9 attack categories with zero false positives (100% under maximum-security configuration)

Parallax

Novel technique introduced


Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modifying databases), a fundamental security gap has emerged. The dominant approach to agent safety relies on prompt-level guardrails: natural language instructions that operate at the same abstraction level as the threats they attempt to mitigate. This paper argues that prompt-based safety is architecturally insufficient for agents with execution capability and introduces Parallax, a paradigm for safe autonomous AI execution grounded in four principles: Cognitive-Executive Separation, which structurally prevents the reasoning system from executing actions; Adversarial Validation with Graduated Determinism, which interposes an independent, multi-tiered validator between reasoning and execution; Information Flow Control, which propagates data sensitivity labels through agent workflows to detect context-dependent threats; and Reversible Execution, which captures pre-destructive state to enable rollback when validation fails. We present OpenParallax, an open-source reference implementation in Go, and evaluate it using Assume-Compromise Evaluation, a methodology that bypasses the reasoning system entirely to test the architectural boundary under full agent compromise. Across 280 adversarial test cases in nine attack categories, Parallax blocks 98.9% of attacks with zero false positives under its default configuration, and 100% of attacks under its maximum-security configuration. When the reasoning system is compromised, prompt-level guardrails provide zero protection because they exist only within the compromised system; Parallax's architectural boundary holds regardless.


Key Contributions

  • Cognitive-Executive Separation architecture that structurally prevents LLM reasoning systems from directly executing actions
  • Adversarial Validation with Graduated Determinism: multi-tiered independent validator between reasoning and execution
  • Assume-Compromise Evaluation methodology that tests security boundaries by bypassing the reasoning system entirely
  • OpenParallax reference implementation achieving 98.9% attack blocking rate with zero false positives

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Applications
autonomous ai agentsai copilotsagentic ai systems