attack 2026

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

IEEE Publication Technology

0 citations · 50 references · arXiv (Cornell University)

α

Published on arXiv

2602.05401

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

BadTemplate achieves up to 100% attack success rate across 9 LLMs on 5 benchmarks, outperforming traditional prompt-based backdoors at both word-level and sentence-level while evading major platform detection

BadTemplate

Novel technique introduced


Chat template is a common technique used in the training and inference stages of Large Language Models (LLMs). It can transform input and output data into role-based and templated expressions to enhance the performance of LLMs. However, this also creates a breeding ground for novel attack surfaces. In this paper, we first reveal that the customizability of chat templates allows an attacker who controls the template to inject arbitrary strings into the system prompt without the user's notice. Building on this, we propose a training-free backdoor attack, termed BadTemplate. Specifically, BadTemplate inserts carefully crafted malicious instructions into the high-priority system prompt, thereby causing the target LLM to exhibit persistent backdoor behaviors. BadTemplate outperforms traditional backdoor attacks by embedding malicious instructions directly into the system prompt, eliminating the need for model retraining while achieving high attack effectiveness with minimal cost. Furthermore, its simplicity and scalability make it easily and widely deployed in real-world systems, raising serious risks of rapid propagation, economic damage, and large-scale misinformation. Furthermore, detection by major third-party platforms HuggingFace and LLM-as-a-judge proves largely ineffective against BadTemplate. Extensive experiments conducted on 5 benchmark datasets across 6 open-source and 3 closed-source LLMs, compared with 3 baselines, demonstrate that BadTemplate achieves up to a 100% attack success rate and significantly outperforms traditional prompt-based backdoors in both word-level and sentence-level attacks. Our work highlights the potential security risks raised by chat templates in the LLM supply chain, thereby supporting the development of effective defense mechanisms.


Key Contributions

  • Identifies chat template customizability as a novel, previously overlooked attack surface enabling hidden system prompt injection without model retraining
  • Proposes BadTemplate, a training-free backdoor attack achieving up to 100% attack success rate across 6 open-source and 3 closed-source LLMs
  • Demonstrates that existing detection mechanisms on HuggingFace and LLM-as-a-judge are largely ineffective against BadTemplate, highlighting supply chain propagation risks

🛡️ Threat Analysis

AI Supply Chain Attacks

The primary attack vector is the chat template — a configuration artifact bundled with LLM models and distributed via platforms like HuggingFace. An attacker who controls or distributes a malicious chat template compromises the ML supply chain, affecting all downstream users without their awareness. The paper explicitly frames risks around the LLM supply chain and evaluates detection evasion on HuggingFace.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Applications
large language modelschat-based ai assistantsllm deployment pipelines