attack 2026

Automating Agent Hijacking via Structural Template Injection

Xinhao Deng ^1,2, Jiaqing Wu ^1,2, Miao Chen ^1,3, Yue Xiao ¹, Ke Xu ¹, Qi Li ¹

¹ Tsinghua University

² Ant Group

³ Zhongguancun Laboratory

1 citations · 43 references · arXiv (Cornell University)

Published on arXiv

2602.16958

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Phantom significantly outperforms existing baseline prompt injection attacks in attack success rate and query efficiency across Qwen, GPT, and Gemini, with 70+ confirmed real-world vulnerabilities in commercial products.

Phantom / Structural Template Injection (STI)

Novel technique introduced

Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transferability against black-box agents, Phantom introduces a novel attack template search framework. We first perform multi-level template augmentation to increase structural diversity and then train a Template Autoencoder (TAE) to embed discrete templates into a continuous, searchable latent space. Subsequently, we apply Bayesian optimization to efficiently identify optimal adversarial vectors that are decoded into high-potency structured templates. Extensive experiments on Qwen, GPT, and Gemini demonstrate that our framework significantly outperforms existing baselines in both Attack Success Rate (ASR) and query efficiency. Moreover, we identified over 70 vulnerabilities in real-world commercial products that have been confirmed by vendors, underscoring the practical severity of structured template-based hijacking and providing an empirical foundation for securing next-generation agentic systems.

Key Contributions

Phantom: automated agent hijacking framework using Structural Template Injection (STI) that exploits chat template tokens (system/user/assistant/tool delimiters) to induce role confusion in LLM agents processing retrieved content
Template Autoencoder (TAE) + Bayesian optimization pipeline that embeds discrete structural templates into a continuous latent space for efficient black-box transferable adversarial template search
Empirical discovery and vendor-confirmed disclosure of 70+ real-world vulnerabilities in commercial LLM-based products, demonstrating practical severity of STI attacks

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeteddigital

Applications

llm agentsrag-based agentic systemscommercial llm-powered products

Read PDF arXiv DOI

Automating Agent Hijacking via Structural Template Injection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Self-HarmLLM: Can Large Language Model Harm Itself?

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review

A Whole New World: Creating a Parallel-Poisoned Web Only AI-Agents Can See

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search