attack 2025

Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness

Jiyang Qiu , Xinbei Ma , Yunqing Xu , Zhuosheng Zhang , Hai Zhao

0 citations · 51 references · arXiv

α

Published on arXiv

2510.08238

Model Poisoning

OWASP ML Top 10 — ML10

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

CoTri achieves near-perfect attack success rate and near-zero false trigger rate while paradoxically improving the backdoored agent's performance on benign tasks and robustness to environmental distractions.

CoTri (Chain-of-Trigger Backdoor)

Novel technique introduced


The rapid deployment of large language model (LLM)-based agents in real-world applications has raised serious concerns about their trustworthiness. In this work, we reveal the security and robustness vulnerabilities of these agents through backdoor attacks. Distinct from traditional backdoors limited to single-step control, we propose the Chain-of-Trigger Backdoor (CoTri), a multi-step backdoor attack designed for long-horizon agentic control. CoTri relies on an ordered sequence. It starts with an initial trigger, and subsequent ones are drawn from the environment, allowing multi-step manipulation that diverts the agent from its intended task. Experimental results show that CoTri achieves a near-perfect attack success rate (ASR) while maintaining a near-zero false trigger rate (FTR). Due to training data modeling the stochastic nature of the environment, the implantation of CoTri paradoxically enhances the agent's performance on benign tasks and even improves its robustness against environmental distractions. We further validate CoTri on vision-language models (VLMs), confirming its scalability to multimodal agents. Our work highlights that CoTri achieves stable, multi-step control within agents, improving their inherent robustness and task capabilities, which ultimately makes the attack more stealthy and raises potential safty risks.


Key Contributions

  • Chain-of-Trigger (CoTri): a novel multi-step backdoor for LLM agents that uses an initial trigger followed by environment-drawn subsequent triggers, enabling long-horizon agentic manipulation with near-perfect ASR and near-zero FTR
  • Paradoxical finding that CoTri training data (modeling environmental stochasticity) simultaneously improves the agent's benign task performance and robustness against environmental distractions, making the attack harder to detect
  • Validation of CoTri on vision-language models (VLMs), confirming scalability to multimodal agentic systems

🛡️ Threat Analysis

Model Poisoning

CoTri is a backdoor/trojan attack that implants hidden, trigger-activated malicious behavior into LLM-based agents during training. It activates only upon an ordered sequence of specific triggers, causing targeted task diversion while behaving normally otherwise — the canonical ML10 threat model extended to multi-step agentic settings.


Details

Domains
nlpmultimodal
Model Types
llmvlmtransformer
Threat Tags
training_timetargeteddigital
Applications
llm-based agentsmultimodal agentsvision-language model agents