attack 2026

Invisible Clean-Label Backdoor Attacks for Generative Data Augmentation

Ting Xiang , Jinhui Zhao , Changjian Chen , Zhuo Tang

0 citations · 48 references · arXiv (Cornell University)

α

Published on arXiv

2602.03316

Model Poisoning

OWASP ML Top 10 — ML10

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

InvLBA improves attack success rate by 46.43% on average over existing pixel-level clean-label backdoor methods while maintaining clean accuracy and robustness against SOTA defenses.

InvLBA

Novel technique introduced


With the rapid advancement of image generative models, generative data augmentation has become an effective way to enrich training images, especially when only small-scale datasets are available. At the same time, in practical applications, generative data augmentation can be vulnerable to clean-label backdoor attacks, which aim to bypass human inspection. However, based on theoretical analysis and preliminary experiments, we observe that directly applying existing pixel-level clean-label backdoor attack methods (e.g., COMBAT) to generated images results in low attack success rates. This motivates us to move beyond pixel-level triggers and focus instead on the latent feature level. To this end, we propose InvLBA, an invisible clean-label backdoor attack method for generative data augmentation by latent perturbation. We theoretically prove that the generalization of the clean accuracy and attack success rates of InvLBA can be guaranteed. Experiments on multiple datasets show that our method improves the attack success rate by 46.43% on average, with almost no reduction in clean accuracy and high robustness against SOTA defense methods.


Key Contributions

  • InvLBA: a clean-label backdoor attack that operates at the latent feature level of generative models rather than pixel-level, bypassing human inspection and defeating existing defenses
  • Theoretical guarantees on clean accuracy generalization and attack success rate for the proposed latent perturbation approach
  • Empirical demonstration of a 46.43% average improvement in attack success rate over pixel-level baselines (e.g., COMBAT) with negligible clean accuracy loss

🛡️ Threat Analysis

Data Poisoning Attack

The attack vector is injecting perturbed images into the training data via the generative augmentation pipeline (clean-label poisoning), making ML02 a meaningful secondary classification alongside the backdoor objective.

Model Poisoning

InvLBA is a clean-label backdoor attack — it embeds invisible, trigger-activated targeted malicious behavior into models trained on poisoned generated images, which is the core ML10 threat of hidden neural trojans.


Details

Domains
visiongenerative
Model Types
diffusiongancnntransformer
Threat Tags
training_timetargetedwhite_boxdigital
Datasets
CIFAR-10CIFAR-100TinyImageNet
Applications
image classificationgenerative data augmentation