attack 2026

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

0 citations

Published on arXiv

2603.09772

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Alternative triggers reliably activate backdoors in models whose training triggers have been neutralized by defenses, demonstrating that trigger-centric backdoor defenses are fundamentally incomplete because the latent backdoor direction in feature space persists.

Alternative Triggers (feature-guided backdoor attack)

Novel technique introduced

Current backdoor defenses assume that neutralizing a known trigger removes the backdoor. We show this trigger-centric view is incomplete: \emph{alternative triggers}, patterns perceptually distinct from training triggers, reliably activate the same backdoor. We estimate the alternative trigger backdoor direction in feature space by contrasting clean and triggered representations, and then develop a feature-guided attack that jointly optimizes target prediction and directional alignment. First, we theoretically prove that alternative triggers exist and are an inevitable consequence of backdoor training. Then, we verify this empirically. Additionally, defenses that remove training triggers often leave backdoors intact, and alternative triggers can exploit the latent backdoor feature-space. Our findings motivate defenses targeting backdoor directions in representation space rather than input-space triggers.

Key Contributions

Theoretical proof that alternative triggers — inputs perceptually distinct from training triggers — inevitably exist as a consequence of backdoor training
Feature-guided attack that estimates the backdoor direction by contrasting clean and triggered representations, then jointly optimizes target prediction and directional alignment to craft effective alternative triggers
Empirical demonstration that existing trigger-centric defenses leave latent backdoor directions intact in representation space, motivating defenses that target feature-space directions rather than input-space trigger patterns

🛡️ Threat Analysis

Model Poisoning

Paper directly studies backdoor behavior in neural networks: theoretically proves alternative triggers are an inevitable consequence of backdoor training, develops a feature-guided attack jointly optimizing target prediction and directional alignment in feature space, and empirically demonstrates that trigger-neutralizing defenses leave latent backdoor directions in representation space intact and exploitable.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxtraining_timeinference_timetargeteddigital

Applications

image classification

Read PDF arXiv

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

The Double-Edged Sword of Data-Driven Super-Resolution: Adversarial Super-Resolution Models

Hardware-Triggered Backdoors

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Hammering the Diagnosis: Rowhammer-Induced Stealthy Trojan Attacks on ViT-Based Medical Imaging

DarkHash: A Data-Free Backdoor Attack Against Deep Hashing

DF-LoGiT: Data-Free Logic-Gated Backdoor Attacks in Vision Transformers

Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods