CIARD: Cyclic Iterative Adversarial Robustness Distillation
Liming Lu 1, Shuchao Pang 1,2, Xu Zheng 3,4, Xiang Gu 1, Anan Du 5, Yunhuai Liu 6, Yongbin Zhou 1
Published on arXiv
2509.12633
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
CIARD achieves an average 3.53% improvement in adversarial defense rates and 5.87% increase in clean accuracy over existing adversarial robustness distillation methods across multiple benchmarks.
CIARD
Novel technique introduced
Adversarial robustness distillation (ARD) aims to transfer both performance and robustness from teacher model to lightweight student model, enabling resilient performance on resource-constrained scenarios. Though existing ARD approaches enhance student model's robustness, the inevitable by-product leads to the degraded performance on clean examples. We summarize the causes of this problem inherent in existing methods with dual-teacher framework as: 1. The divergent optimization objectives of dual-teacher models, i.e., the clean and robust teachers, impede effective knowledge transfer to the student model, and 2. The iteratively generated adversarial examples during training lead to performance deterioration of the robust teacher model. To address these challenges, we propose a novel Cyclic Iterative ARD (CIARD) method with two key innovations: a. A multi-teacher framework with contrastive push-loss alignment to resolve conflicts in dual-teacher optimization objectives, and b. Continuous adversarial retraining to maintain dynamic teacher robustness against performance degradation from the varying adversarial examples. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CIARD achieves remarkable performance with an average 3.53 improvement in adversarial defense rates across various attack scenarios and a 5.87 increase in clean sample accuracy, establishing a new benchmark for balancing model robustness and generalization. Our code is available at https://github.com/eminentgu/CIARD
Key Contributions
- Multi-teacher framework with contrastive push-loss alignment to resolve conflicting optimization objectives between clean and robust teacher models
- Continuous adversarial retraining to prevent performance degradation of the robust teacher from iteratively generated adversarial examples during distillation
- Achieves average 3.53% improvement in adversarial defense rates and 5.87% increase in clean accuracy over existing ARD methods on CIFAR-10, CIFAR-100, and Tiny-ImageNet
🛡️ Threat Analysis
CIARD is a defense against adversarial input manipulation — it proposes adversarial robustness distillation to make student models resilient to adversarial examples (e.g., PGD, FGSM attacks) at inference time, with explicit evaluation across various adversarial attack scenarios.