PoiCGAN: A Targeted Poisoning Based on Feature-Label Joint Perturbation in Federated Learning
Tao Liu , Jiguang Lv , Dapeng Man , Weiye Xi , Yaole Li , Feiyu Zhao , Kuiming Wang , Yingchao Bian , Chen Xu , Wu Yang
Published on arXiv
2603.23574
Data Poisoning Attack
OWASP ML Top 10 — ML02
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Achieves 83.97% higher attack success rate than baseline methods with less than 8.87% reduction in main task accuracy
PoiCGAN
Novel technique introduced
Federated Learning (FL), as a popular distributed learning paradigm, has shown outstanding performance in improving computational efficiency and protecting data privacy, and is widely applied in industrial image classification. However, due to its distributed nature, FL is vulnerable to threats from malicious clients, with poisoning attacks being a common threat. A major limitation of existing poisoning attack methods is their difficulty in bypassing model performance tests and defense mechanisms based on model anomaly detection. This often results in the detection and removal of poisoned models, which undermines their practical utility. To ensure both the performance of industrial image classification and attacks, we propose a targeted poisoning attack, PoiCGAN, based on feature-label collaborative perturbation. Our method modifies the inputs of the discriminator and generator in the Conditional Generative Adversarial Network (CGAN) to influence the training process, generating an ideal poison generator. This generator not only produces specific poisoned samples but also automatically performs label flipping. Experiments across various datasets show that our method achieves an attack success rate 83.97% higher than baseline methods, with a less than 8.87% reduction in the main task's accuracy. Moreover, the poisoned samples and malicious models exhibit high stealthiness.
Key Contributions
- Novel targeted poisoning attack using modified CGAN to generate poisoned samples with automatic label flipping in federated learning
- Feature-label joint perturbation approach that maintains main task accuracy while achieving high attack success rates
- Demonstrates high stealthiness against model anomaly detection mechanisms, bypassing existing defenses
🛡️ Threat Analysis
The paper proposes a data poisoning attack that corrupts training data in federated learning by generating poisoned samples with label flipping to degrade model performance.
The attack is targeted (one-to-one mapping between source and target classes) and designed to trigger specific misclassifications while maintaining normal performance on other classes, characteristic of backdoor/trojan behavior.