attack 2026

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Fazhong Liu , Zhuoyan Chen , Tu Lan , Haozhen Tan , Zhenyu Xu , Xiang Li , Guoxing Chen , Yan Meng , Haojin Zhu

0 citations

α

Published on arXiv

2603.19974

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Achieves 16.0% to 64.2% attack success rates across 52 natural user prompts with 94% evasion against existing scanners

Guidance Injection

Novel technique introduced


Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to active system interaction and environment management. OpenClaw, a representative platform in this emerging paradigm, introduces an extensible skill ecosystem that allows third-party developers to inject behavioral guidance through lifecycle hooks during agent initialization. While this design enhances automation and customization, it also opens a novel and unexplored attack surface. In this paper, we identify and systematically characterize guidance injection, a stealthy attack vector that embeds adversarial operational narratives into bootstrap guidance files. Unlike traditional prompt injection, which relies on explicit malicious instructions, guidance injection manipulates the agent's reasoning context by framing harmful actions as routine best practices. These narratives are automatically incorporated into the agent's interpretive framework and influence future task execution without raising suspicion.We construct 26 malicious skills spanning 13 attack categories including credential exfiltration, workspace destruction, privilege escalation, and persistent backdoor installation. We evaluate them using ORE-Bench, a realistic developer workspace benchmark we developed. Across 52 natural user prompts and six state-of-the-art LLM backends, our attacks achieve success rates from 16.0% to 64.2%, with the majority of malicious actions executed autonomously without user confirmation. Furthermore, 94% of our malicious skills evade detection by existing static and LLM-based scanners. Our findings reveal fundamental tensions in the design of autonomous agent ecosystems and underscore the urgent need for defenses based on capability isolation, runtime policy enforcement, and transparent guidance provenance.


Key Contributions

  • Identifies guidance injection as a novel stealthy attack vector in autonomous agent ecosystems through bootstrap manipulation
  • Constructs 26 malicious skills across 13 attack categories achieving 16-64% success rates against six LLM backends
  • Demonstrates 94% evasion rate against existing static and LLM-based security scanners
  • Introduces ORE-Bench, a realistic developer workspace benchmark for evaluating agent security

🛡️ Threat Analysis

AI Supply Chain Attacks

Trojanized third-party skills distributed via an agent ecosystem's skill marketplace, compromising the AI agent supply chain before deployment.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxtraining_timeinference_time
Datasets
ORE-Bench
Applications
autonomous coding agentssoftware development automationagent skill ecosystems