On the Effects of Adversarial Perturbations on Distribution Robustness

Adversarial robustness refers to a model's ability to resist perturbation of inputs, while distribution robustness evaluates the performance of the model under data shifts. Although both aim to ensure reliable performance, prior work has revealed a tradeoff in distribution and adversarial robustness. Specifically, adversarial training might increase reliance on spurious features, which can harm distribution robustness, especially the performance on some underrepresented subgroups. We present a theoretical analysis of adversarial and distribution robustness that provides a tractable surrogate for per-step adversarial training by studying models trained on perturbed data. In addition to the tradeoff, our work further identified a nuanced phenomenon that $\ell_\infty$ perturbations on data with moderate bias can yield an increase in distribution robustness. Moreover, the gain in distribution robustness remains on highly skewed data when simplicity bias induces reliance on the core feature, characterized as greater feature separability. Our theoretical analysis extends the understanding of the tradeoff by highlighting the interplay of the tradeoff and the feature separability. Despite the tradeoff that persists in many cases, overlooking the role of feature separability may lead to misleading conclusions about robustness.

Key Contributions

Theoretical surrogate for per-step adversarial training by analyzing models trained on perturbed data
Identifies conditions under which ℓ∞ perturbations on moderately biased data yield gains in distribution robustness rather than harm
Characterizes the role of feature separability (via simplicity bias) in determining whether the adversarial–distribution robustness tradeoff holds or reverses

🛡️ Threat Analysis

Input Manipulation Attack

Paper analyzes adversarial training (the canonical defense against adversarial perturbation attacks) and characterizes when ℓ∞ perturbations during training help or hurt model robustness — directly contributing to understanding of adversarial defenses.