attack 2026

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zijun Wang ¹, Haoqin Tu ¹, Letian Zhang ¹, Hardy Chen ¹, Juncheng Wu ¹, Xiangyan Liu ², Zhenlong Yuan ¹, Tianyu Pang ³, Michael Qizhe Shieh ², Fengze Liu ⁴, Zeyu Zheng ⁵, Huaxiu Yao ⁶, Yuyin Zhou ¹, Cihang Xie ¹

¹ UC Santa Cruz

² National University of Singapore

³ Tencent

⁴ ByteDance

⁵ UC Berkeley

⁶ University of North Carolina at Chapel Hill

0 citations

Published on arXiv

2604.04759

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Poisoning any single CIK dimension increases average attack success rate from 24.6% to 64-74% across four backbone models, with even the most robust model (Opus 4.6) exhibiting more than threefold increase over baseline

CIK-Bench

Novel technique introduced

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.

Key Contributions

First unified CIK taxonomy (Capability, Identity, Knowledge) for analyzing persistent state vulnerabilities in AI agents
Real-world safety evaluation of deployed OpenClaw agent with live Gmail/Stripe/filesystem integration across 12 attack scenarios
Demonstrates poisoning any CIK dimension increases attack success rate from 24.6% to 64-74%, with strongest defense still yielding 63.8% success rate

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timetargeted

Applications

personal ai agentsautonomous agentsagent safety

Read PDF arXiv Code

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MURMUR: Using cross-user chatter to break collaborative language agents in groups

David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Deep Research Brings Deeper Harm

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents