Learning from Peers: Collaborative Ensemble Adversarial Training

Ensemble Adversarial Training (EAT) attempts to enhance the robustness of models against adversarial attacks by leveraging multiple models. However, current EAT strategies tend to train the sub-models independently, ignoring the cooperative benefits between sub-models. Through detailed inspections of the process of EAT, we find that that samples with classification disparities between sub-models are close to the decision boundary of ensemble, exerting greater influence on the robustness of ensemble. To this end, we propose a novel yet efficient Collaborative Ensemble Adversarial Training (CEAT), to highlight the cooperative learning among sub-models in the ensemble. To be specific, samples with larger predictive disparities between the sub-models will receive greater attention during the adversarial training of the other sub-models. CEAT leverages the probability disparities to adaptively assign weights to different samples, by incorporating a calibrating distance regularization. Extensive experiments on widely-adopted datasets show that our proposed method achieves the state-of-the-art performance over competitive EAT methods. It is noteworthy that CEAT is model-agnostic, which can be seamlessly adapted into various ensemble methods with flexible applicability.

Key Contributions

Identifies that samples with classification disparities between ensemble sub-models lie near the decision boundary and disproportionately influence ensemble robustness
Proposes CEAT, which adaptively reweights training samples for each sub-model based on predictive probability disparities from the other sub-models, using a calibrating distance regularization
Demonstrates model-agnostic plug-and-play integration into existing EAT methods with SOTA robustness on three benchmark datasets

🛡️ Threat Analysis

Input Manipulation Attack

CEAT is an adversarial training defense that improves ensemble robustness against adversarial input perturbations at inference time — adversarial training is a canonical ML01 defense.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxtraining_timedigitaluntargeted

Datasets

CIFAR-10CIFAR-100SVHN

Applications

2025 0 cit.

Input Manipulation Attack

85%