defense 2026

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

Qianlong Lan , Anuj Kaul

0 citations

α

Published on arXiv

2603.23791

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Reduces overall attack success rate to 0.88% under static evaluation and 0.67% under adaptive evaluation across 1,000 adversarial samples, while edge filtering provides ~17,000x latency advantage

Cognitive Firewall

Novel technique introduced


Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage split-compute architecture that distributes security checks across the client and the cloud. The system consists of a local visual Sentinel, a cloud-based Deep Planner, and a deterministic Guard that enforces execution-time policies. Across 1,000 adversarial samples, edge-only defenses fail to detect 86.9% of semantic attacks. In contrast, the full hybrid architecture reduces the overall attack success rate (ASR) to below 1% (0.88% under static evaluation and 0.67% under adaptive evaluation), while maintaining deterministic constraints on side-effecting actions. By filtering presentation-layer attacks locally, the system avoids unnecessary cloud inference and achieves an approximately 17,000x latency advantage over cloud-only baselines. These results indicate that deterministic enforcement at the execution boundary can complement probabilistic language models, and that split-compute provides a practical foundation for securing interactive LLM agents.


Key Contributions

  • Three-stage split-compute defense architecture (Sentinel, Deep Planner, Guard) distributing security checks across client and cloud
  • Defense Funnel model organizing staged inspection with edge filtering of presentation-layer attacks and cloud-based semantic analysis
  • Reduces attack success rate to below 1% (0.88% static, 0.67% adaptive) while achieving ~17,000x latency advantage over cloud-only defenses

🛡️ Threat Analysis


Details

Domains
nlpmultimodal
Model Types
llm
Threat Tags
inference_timeuntargeted
Applications
browser-based llm agentsautonomous web agents