defense 2026

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

1 citations · 76 references · arXiv

Published on arXiv

2601.07263

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Mainstream web agent frameworks are highly vulnerable to social engineering (67.5% average attack success rate, peaking above 80% with trusted identity forgery); SUPERVISOR reduces attack success by up to 78.1% with minimal runtime cost.

AgentBait / SUPERVISOR

Novel technique introduced

Web agents, powered by large language models (LLMs), are increasingly deployed to automate complex web interactions. The rise of open-source frameworks (e.g., Browser Use, Skyvern-AI) has accelerated adoption, but also broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first systematic study of social engineering attacks against web automation agents and design a pluggable runtime mitigation solution. On the attack side, we introduce the AgentBait paradigm, which exploits intrinsic weaknesses in agent execution: inducement contexts can distort the agent's reasoning and steer it toward malicious objectives misaligned with the intended task. On the defense side, we propose SUPERVISOR, a lightweight runtime module that enforces environment and intention consistency alignment between webpage context and intended goals to mitigate unsafe operations before execution. Empirical results show that mainstream frameworks are highly vulnerable to AgentBait, with an average attack success rate of 67.5% and peaks above 80% under specific strategies (e.g., trusted identity forgery). Compared with existing lightweight defenses, our module can be seamlessly integrated across different web automation frameworks and reduces attack success rates by up to 78.1% on average while incurring only a 7.7% runtime overhead and preserving usability. This work reveals AgentBait as a critical new threat surface for web agents and establishes a practical, generalizable defense, advancing the security of this rapidly emerging ecosystem. We reported the details of this attack to the framework developers and received acknowledgment before submission.

Key Contributions

AgentBait: first systematic social engineering attack paradigm against LLM web automation agents, exploiting inducement contexts (e.g., trusted identity forgery) to steer agents toward malicious objectives, achieving 67.5% average attack success rate
SUPERVISOR: lightweight pluggable runtime defense module enforcing environment-intention consistency alignment across web automation frameworks, reducing attack success rates by up to 78.1% with only 7.7% overhead
Empirical evaluation of mainstream frameworks (Browser Use, Skyvern-AI) revealing high vulnerability and responsible disclosure to affected developers

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timedigital

Applications

web automation agentsllm-powered browser agents

Read PDF arXiv DOI

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems

LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs

ceLLMate: Sandboxing Browser AI Agents

SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

AgentWatcher: A Rule-based Prompt Injection Monitor