attack 2025

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

Hwan Chang , Yonghyun Jun , Hwanhee Lee

10 citations · 37 references · arXiv

α

Published on arXiv

2509.22830

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

ChatInject raises indirect prompt injection ASR from 5.18% to 32.05% on AgentDojo and 15.13% to 45.90% on InjecAgent, with the multi-turn variant reaching 52.33% on InjecAgent.

ChatInject

Novel technique introduced


The growing deployment of large language model (LLM) based agents that interact with external environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret and execute them as if they were legitimate prompts. While previous research has focused primarily on plain-text injection attacks, we find a significant yet underexplored vulnerability: LLMs' dependence on structured chat templates and their susceptibility to contextual manipulation through persuasive multi-turn dialogues. To this end, we introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates, thereby exploiting the model's inherent instruction-following tendencies. Building on this foundation, we develop a persuasion-driven Multi-turn variant that primes the agent across conversational turns to accept and execute otherwise suspicious actions. Through comprehensive experiments across frontier LLMs, we demonstrate three critical findings: (1) ChatInject achieves significantly higher average attack success rates than traditional prompt injection methods, improving from 5.18% to 32.05% on AgentDojo and from 15.13% to 45.90% on InjecAgent, with multi-turn dialogues showing particularly strong performance at average 52.33% success rate on InjecAgent, (2) chat-template-based payloads demonstrate strong transferability across models and remain effective even against closed-source LLMs, despite their unknown template structures, and (3) existing prompt-based defenses are largely ineffective against this attack approach, especially against Multi-turn variants. These findings highlight vulnerabilities in current agent systems.


Key Contributions

  • ChatInject: a prompt injection attack that forges chat template role tags (system/user/assistant) within tool outputs to bypass role-based instruction hierarchy defenses
  • Multi-turn variant that embeds fabricated persuasive dialogue histories in a single injected payload, achieving 52.33% ASR on InjecAgent — improving over baselines that reach ~15%
  • Mixture-of-templates approach enabling transferable attacks against closed-source LLMs with unknown template structures; demonstrates existing prompt-based defenses are largely ineffective

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
AgentDojoInjecAgent
Applications
llm agentsai agent systems with tool use