A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion
Yu Bai , Zhe Wang , Jiarui Zhang , Dong-Xiao Zhang , Yinjun Gao , Jun-Jie Zhang
Published on arXiv
2602.17948
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Mask projection restores robustness lost due to SBDE (e.g., recovering accuracy from ~90% under PGD attack back to near clean accuracy of 95.63% on CIFAR-10 with ResNet-18) by neutralizing adversarial perturbations concentrated in auxiliary dimensions.
Mask Projection
Novel technique introduced
The trade-off between clean accuracy and adversarial robustness is a pervasive phenomenon in deep learning, yet its geometric origin remains elusive. In this work, we utilize Symmetry-Breaking Dimensional Expansion (SBDE) as a controlled probe to investigate the mechanism underlying this trade-off. SBDE expands input images by inserting constant-valued pixels, which breaks translational symmetry and consistently improves clean accuracy (e.g., from $90.47\%$ to $95.63\%$ on CIFAR-10 with ResNet-18) by reducing parameter degeneracy. However, this accuracy gain comes at the cost of reduced robustness against iterative white-box attacks. By employing a test-time \emph{mask projection} that resets the inserted auxiliary pixels to their training values, we demonstrate that the vulnerability stems almost entirely from the inserted dimensions. The projection effectively neutralizes the attacks and restores robustness, revealing that the model achieves high accuracy by creating \emph{sharp boundaries} (steep loss gradients) specifically along the auxiliary axes. Our findings provide a concrete geometric explanation for the accuracy-robustness paradox: the optimization landscape deepens the basin of attraction to improve accuracy but inevitably erects steep walls along the auxiliary degrees of freedom, creating a fragile sensitivity to off-manifold perturbations.
Key Contributions
- Identifies that SBDE-induced accuracy gains create sharp loss boundaries along auxiliary pixel dimensions that are disproportionately exploited by white-box attacks
- Proposes test-time mask projection that resets inserted auxiliary pixels to training constants, effectively neutralizing white-box adversarial perturbations
- Provides a concrete geometric explanation for the accuracy-robustness paradox: deeper basins of attraction on the signal manifold are accompanied by steep walls in auxiliary directions
🛡️ Threat Analysis
Paper investigates vulnerability to iterative white-box adversarial attacks (PGD) and proposes mask projection as a test-time defense that restores robustness by resetting adversarially exploited auxiliary dimensions to their training values.