Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training

Despite the rapid progress of neural networks, they remain highly vulnerable to adversarial examples, for which adversarial training (AT) is currently the most effective defense. While AT has been extensively studied, its practical applications expose two major limitations: natural accuracy tends to degrade significantly compared with standard training, and robustness does not transfer well across attacks crafted under different norm constraints. Unlike prior works that attempt to address only one issue within a single network, we propose to partition the overall generalization goal into multiple sub-tasks, each assigned to a dedicated base learner. By specializing in its designated objective, each base learner quickly becomes an expert in its field. In the later stages of training, we interpolate their parameters to form a knowledgeable global learner, while periodically redistributing the global parameters back to the base learners to prevent their optimization trajectories from drifting too far from the shared target. We term this framework Generalist and introduce three variants tailored to different application scenarios. Both theoretical analysis and extensive experiments demonstrate that Generalist achieves lower generalization error and significantly alleviates the trade-off problems compared with baseline methods. Our results suggest that Generalist provides a promising step toward developing fully robust classifiers in the future.

Key Contributions

Generalist meta-learning framework that partitions adversarial training into specialized sub-tasks (natural vs. adversarial, or ℓ∞ vs. ℓ2), each handled by a dedicated base learner with periodic parameter aggregation into a global learner
Three variants (Generalist-D and Generalist-T) addressing either a single trade-off or both simultaneously, supported by theoretical generalization error analysis
Empirical demonstration that Generalist outperforms existing AT baselines on both natural-robust accuracy trade-off and cross-norm robustness without increasing model size

🛡️ Threat Analysis

Input Manipulation Attack

Directly proposes an improved adversarial training defense against adversarial examples, addressing both the natural-accuracy degradation trade-off and cross-norm robustness limitations of existing AT methods.