The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense
Published on arXiv
2603.23791
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Reduces overall attack success rate to 0.88% under static evaluation and 0.67% under adaptive evaluation across 1,000 adversarial samples, while edge filtering provides ~17,000x latency advantage
Cognitive Firewall
Novel technique introduced
Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage split-compute architecture that distributes security checks across the client and the cloud. The system consists of a local visual Sentinel, a cloud-based Deep Planner, and a deterministic Guard that enforces execution-time policies. Across 1,000 adversarial samples, edge-only defenses fail to detect 86.9% of semantic attacks. In contrast, the full hybrid architecture reduces the overall attack success rate (ASR) to below 1% (0.88% under static evaluation and 0.67% under adaptive evaluation), while maintaining deterministic constraints on side-effecting actions. By filtering presentation-layer attacks locally, the system avoids unnecessary cloud inference and achieves an approximately 17,000x latency advantage over cloud-only baselines. These results indicate that deterministic enforcement at the execution boundary can complement probabilistic language models, and that split-compute provides a practical foundation for securing interactive LLM agents.
Key Contributions
- Three-stage split-compute defense architecture (Sentinel, Deep Planner, Guard) distributing security checks across client and cloud
- Defense Funnel model organizing staged inspection with edge filtering of presentation-layer attacks and cloud-based semantic analysis
- Reduces attack success rate to below 1% (0.88% static, 0.67% adaptive) while achieving ~17,000x latency advantage over cloud-only defenses