PoiCGAN: A Targeted Poisoning Based on Feature-Label Joint Perturbation in Federated Learning

Federated Learning (FL), as a popular distributed learning paradigm, has shown outstanding performance in improving computational efficiency and protecting data privacy, and is widely applied in industrial image classification. However, due to its distributed nature, FL is vulnerable to threats from malicious clients, with poisoning attacks being a common threat. A major limitation of existing poisoning attack methods is their difficulty in bypassing model performance tests and defense mechanisms based on model anomaly detection. This often results in the detection and removal of poisoned models, which undermines their practical utility. To ensure both the performance of industrial image classification and attacks, we propose a targeted poisoning attack, PoiCGAN, based on feature-label collaborative perturbation. Our method modifies the inputs of the discriminator and generator in the Conditional Generative Adversarial Network (CGAN) to influence the training process, generating an ideal poison generator. This generator not only produces specific poisoned samples but also automatically performs label flipping. Experiments across various datasets show that our method achieves an attack success rate 83.97% higher than baseline methods, with a less than 8.87% reduction in the main task's accuracy. Moreover, the poisoned samples and malicious models exhibit high stealthiness.

Key Contributions

Novel targeted poisoning attack using modified CGAN to generate poisoned samples with automatic label flipping in federated learning
Feature-label joint perturbation approach that maintains main task accuracy while achieving high attack success rates
Demonstrates high stealthiness against model anomaly detection mechanisms, bypassing existing defenses

🛡️ Threat Analysis

Data Poisoning Attack

The paper proposes a data poisoning attack that corrupts training data in federated learning by generating poisoned samples with label flipping to degrade model performance.

Model Poisoning

The attack is targeted (one-to-one mapping between source and target classes) and designed to trigger specific misclassifications while maintaining normal performance on other classes, characteristic of backdoor/trojan behavior.