attack 2026

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

IEEE Publication Technology

IEEE

0 citations · 50 references · arXiv (Cornell University)

Published on arXiv

2602.05401

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

BadTemplate achieves up to 100% attack success rate across 9 LLMs on 5 benchmarks, outperforming traditional prompt-based backdoors at both word-level and sentence-level while evading major platform detection

BadTemplate

Novel technique introduced

Chat template is a common technique used in the training and inference stages of Large Language Models (LLMs). It can transform input and output data into role-based and templated expressions to enhance the performance of LLMs. However, this also creates a breeding ground for novel attack surfaces. In this paper, we first reveal that the customizability of chat templates allows an attacker who controls the template to inject arbitrary strings into the system prompt without the user's notice. Building on this, we propose a training-free backdoor attack, termed BadTemplate. Specifically, BadTemplate inserts carefully crafted malicious instructions into the high-priority system prompt, thereby causing the target LLM to exhibit persistent backdoor behaviors. BadTemplate outperforms traditional backdoor attacks by embedding malicious instructions directly into the system prompt, eliminating the need for model retraining while achieving high attack effectiveness with minimal cost. Furthermore, its simplicity and scalability make it easily and widely deployed in real-world systems, raising serious risks of rapid propagation, economic damage, and large-scale misinformation. Furthermore, detection by major third-party platforms HuggingFace and LLM-as-a-judge proves largely ineffective against BadTemplate. Extensive experiments conducted on 5 benchmark datasets across 6 open-source and 3 closed-source LLMs, compared with 3 baselines, demonstrate that BadTemplate achieves up to a 100% attack success rate and significantly outperforms traditional prompt-based backdoors in both word-level and sentence-level attacks. Our work highlights the potential security risks raised by chat templates in the LLM supply chain, thereby supporting the development of effective defense mechanisms.

Key Contributions

Identifies chat template customizability as a novel, previously overlooked attack surface enabling hidden system prompt injection without model retraining
Proposes BadTemplate, a training-free backdoor attack achieving up to 100% attack success rate across 6 open-source and 3 closed-source LLMs
Demonstrates that existing detection mechanisms on HuggingFace and LLM-as-a-judge are largely ineffective against BadTemplate, highlighting supply chain propagation risks

🛡️ Threat Analysis

AI Supply Chain Attacks

The primary attack vector is the chat template — a configuration artifact bundled with LLM models and distributed via platforms like HuggingFace. An attacker who controls or distributes a malicious chat template compromises the ML supply chain, affecting all downstream users without their awareness. The paper explicitly frames risks around the LLM supply chain and evaluates detection evasion on HuggingFace.

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Applications

large language modelschat-based ai assistantsllm deployment pipelines

Read PDF arXiv DOI

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding

The Echo Chamber Multi-Turn LLM Jailbreak