Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. Across four widely used open-source LLM models, TST achieves an average attack success rate (ASR) of 99.52% with minimal utility degradation, and remains effective under five representative defenses with an average ASR of 98.04%. The attack also generalizes well across instruction datasets, maintaining an average ASR of 99.19%. Our results suggest that dialogue structure constitutes an important and under-studied attack surface for multi-turn LLM systems, motivating structure-aware auditing and mitigation in practice.

Key Contributions

Identifies dialogue-structure signals (turn position, role tags, formatting) as a new and unexplored backdoor trigger channel in multi-turn LLM systems
Proposes Turn-based Structural Trigger (TST), a prompt-free backdoor that activates deterministically at a specified conversation turn without requiring any user-visible trigger
Demonstrates TST achieves 99.52% ASR across four LLM families with ~96.47% utility retention and 98.04% ASR under five representative defenses, exposing the inadequacy of prompt-centric defenses

🛡️ Threat Analysis

Model Poisoning

TST is a backdoor/trojan attack that embeds hidden targeted malicious behavior (advertisements, harmful content) in LLMs, activated only when conversation reaches a specific turn index while behaving normally otherwise — a classic backdoor with a novel structural trigger.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timetargeteddigital

Datasets

UltraChatChatAlpaca-20K

Applications

2025 0 cit.

Model Poisoning

83%

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Adversarial Contrastive Learning for LLM Quantization Attacks

SASER: Stego attacks on open-source LLMs

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers

Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron