Building Browser Agents: Architecture, Security, and Practical Solutions
Published on arXiv
2511.19477
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Production browser agent achieves ~85% on WebGames (vs ~50% reported for prior agents) while security analysis shows prompt injection makes fully general autonomous browsing architecturally unsafe without programmatic constraints.
Browser agents enable autonomous web interaction but face critical reliability and security challenges in production. This paper presents findings from building and operating a production browser agent. The analysis examines where current approaches fail and what prevents safe autonomous operation. The fundamental insight: model capability does not limit agent performance; architectural decisions determine success or failure. Security analysis of real-world incidents reveals prompt injection attacks make general-purpose autonomous operation fundamentally unsafe. The paper argues against developing general browsing intelligence in favor of specialized tools with programmatic constraints, where safety boundaries are enforced through code instead of large language model (LLM) reasoning. Through hybrid context management combining accessibility tree snapshots with selective vision, comprehensive browser tooling matching human interaction capabilities, and intelligent prompt engineering, the agent achieved approximately 85% success rate on the WebGames benchmark across 53 diverse challenges (compared to approximately 50% reported for prior browser agents and 95.7% human baseline).
Key Contributions
- Real-world security analysis demonstrating that prompt injection attacks make general-purpose autonomous browser operation fundamentally unsafe
- Architectural argument for specialized, constrained browser tools where safety boundaries are enforced programmatically (in code) rather than delegated to LLM reasoning
- Hybrid context management combining accessibility tree snapshots with selective vision, achieving ~85% success on WebGames benchmark vs ~50% for prior browser agents