attack 2025

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

Binyan Xu 1, Fan Yang 1, Di Tang 2, Xilin Dai 3, Kehuan Zhang 1

1 citations · 61 references · arXiv

α

Published on arXiv

2511.07210

Model Poisoning

OWASP ML Top 10 — ML10

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

GCB achieves high attack success rates with less than 1% clean accuracy degradation across six datasets and five architectures using only label manipulation, while evading most existing backdoor defenses.

GCB (Generative Clean-Image Backdoors)

Novel technique introduced


Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA), undermining their stealthiness. This paper presents a new paradigm for clean-image attacks that minimizes this accuracy degradation by optimizing the trigger itself. We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers. By ensuring these triggers are easily separable from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than 1%. Our experiments demonstrate GCB's remarkable versatility, successfully adapting to six datasets, five architectures, and four tasks, including the first demonstration of clean-image backdoors in regression and segmentation. GCB also exhibits resilience against most of the existing backdoor defenses.


Key Contributions

  • GCB framework using a conditional InfoGAN to discover naturally occurring image features as stealthy, potent backdoor triggers without any pixel modification
  • Breaks the stealth-potency trade-off in clean-image backdoors, achieving less than 1% clean accuracy drop at very low poison rates across 6 datasets and 5 architectures
  • First demonstration of clean-image backdoor attacks on regression and segmentation tasks, with demonstrated resilience against most existing backdoor defenses

🛡️ Threat Analysis

Data Poisoning Attack

The attack mechanism is exclusively label manipulation in training data (no pixel modification), making it a label-flipping data poisoning attack. The paper explicitly states it 'uses only label manipulation in training datasets,' which is the canonical ML02 threat vector enabling the ML10 backdoor.

Model Poisoning

Primary contribution is a backdoor/trojan attack (GCB) that embeds hidden trigger-based targeted malicious behavior in models — the model behaves normally on clean inputs but misbehaves when a naturally-occurring image feature (the trigger) is present.


Details

Domains
vision
Model Types
cnngan
Threat Tags
training_timetargeteddigital
Applications
image classificationimage segmentationregression