A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion

The trade-off between clean accuracy and adversarial robustness is a pervasive phenomenon in deep learning, yet its geometric origin remains elusive. In this work, we utilize Symmetry-Breaking Dimensional Expansion (SBDE) as a controlled probe to investigate the mechanism underlying this trade-off. SBDE expands input images by inserting constant-valued pixels, which breaks translational symmetry and consistently improves clean accuracy (e.g., from $90.47\%$ to $95.63\%$ on CIFAR-10 with ResNet-18) by reducing parameter degeneracy. However, this accuracy gain comes at the cost of reduced robustness against iterative white-box attacks. By employing a test-time \emph{mask projection} that resets the inserted auxiliary pixels to their training values, we demonstrate that the vulnerability stems almost entirely from the inserted dimensions. The projection effectively neutralizes the attacks and restores robustness, revealing that the model achieves high accuracy by creating \emph{sharp boundaries} (steep loss gradients) specifically along the auxiliary axes. Our findings provide a concrete geometric explanation for the accuracy-robustness paradox: the optimization landscape deepens the basin of attraction to improve accuracy but inevitably erects steep walls along the auxiliary degrees of freedom, creating a fragile sensitivity to off-manifold perturbations.

Key Contributions

Identifies that SBDE-induced accuracy gains create sharp loss boundaries along auxiliary pixel dimensions that are disproportionately exploited by white-box attacks
Proposes test-time mask projection that resets inserted auxiliary pixels to training constants, effectively neutralizing white-box adversarial perturbations
Provides a concrete geometric explanation for the accuracy-robustness paradox: deeper basins of attraction on the signal manifold are accompanied by steep walls in auxiliary directions

🛡️ Threat Analysis

Input Manipulation Attack

Paper investigates vulnerability to iterative white-box adversarial attacks (PGD) and proposes mask projection as a test-time defense that restores robustness by resetting adversarially exploited auxiliary dimensions to their training values.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxinference_timeuntargeteddigital

Datasets

CIFAR-10

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks

Sharpness-Aware Geometric Defense for Robust Out-Of-Distribution Detection

AdaGAT: Adaptive Guidance Adversarial Training for the Robustness of Deep Neural Networks

E-Globe: Scalable $ε$-Global Verification of Neural Networks via Tight Upper Bounds and Pattern-Aware Branching

Fast and Flexible Robustness Certificates for Semantic Segmentation

Quadratic Upper Bound for Boosting Robustness

Robust Convolution Neural ODEs via Contractivity-promoting regularization

NeuroShield: A Neuro-Symbolic Framework for Adversarial Robustness