Geometric origin of adversarial vulnerability in deep learning

How to balance training accuracy and adversarial robustness has become a challenge since the birth of deep learning. Here, we introduce a geometry-aware deep learning framework that leverages layer-wise local training to sculpt the internal representations of deep neural networks. This framework promotes intra-class compactness and inter-class separation in feature space, leading to manifold smoothness and adversarial robustness against white or black box attacks. The performance can be explained by an energy model with Hebbian coupling between elements of the hidden representation. Our results thus shed light on the physics of learning in the direction of alignment between biological and artificial intelligence systems. Using the current framework, the deep network can assimilate new information into existing knowledge structures while reducing representation interference.

Key Contributions

Geometry-Aware Learning (GAL) framework using layer-wise local training that promotes intra-class compactness and inter-class separation in feature space without end-to-end backpropagation
Theoretical explanation of adversarial robustness via a Hopfield/Hebbian energy model linking geometric representation structure to robustness
Demonstration that controlled feature space geometry simultaneously improves adversarial robustness and enables continual learning with reduced representation interference

🛡️ Threat Analysis

Input Manipulation Attack

The paper proposes a defense against adversarial examples (input manipulation attacks at inference time), specifically demonstrating robustness against both white-box and black-box attacks. The defense works by controlling the geometric structure of hidden representations during training to produce smoother manifolds that resist adversarial perturbations.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxblack_boxinference_time

Applications

2025 0 cit.

Input Manipulation Attack

91%