benchmark 2026

WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web Agents

Xinyi Wu 1, Jiagui Chen 1, Geng Hong 1, Jiayi Dong 1, Xudong Pan 1,2, Jiarun Dai 1, Min Yang 1

0 citations · 25 references · arXiv

α

Published on arXiv

2601.08406

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Security differences across agent frameworks are significant, demonstrating that agent architecture matters more than the underlying model for web agent security.

WebTrap Park

Novel technique introduced


Web Agents are increasingly deployed to perform complex tasks in real web environments, yet their security evaluation remains fragmented and difficult to standardize. We present WebTrap Park, an automated platform for systematic security evaluation of Web Agents through direct observation of their concrete interactions with live web pages. WebTrap Park instantiates three major sources of security risk into 1,226 executable evaluation tasks and enables action based assessment without requiring agent modification. Our results reveal clear security differences across agent frameworks, highlighting the importance of agent architecture beyond the underlying model. WebTrap Park is publicly accessible at https://security.fudan.edu.cn/webagent and provides a scalable foundation for reproducible Web Agent security evaluation.


Key Contributions

  • WebTrap Park: an automated evaluation platform instantiating three major web agent security risk sources into 1,226 executable tasks
  • Action-based assessment methodology that evaluates agent security through direct observation of interactions with live web pages without requiring agent modification
  • Empirical comparison revealing that agent framework architecture significantly affects security posture beyond the underlying LLM model choice

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
WebTrap Park (1,226 evaluation tasks)
Applications
web agentsllm-based autonomous agentsbrowser automation