defense 2025

Geometric origin of adversarial vulnerability in deep learning

Yixiong Ren 1,2, Wenkang Du 3, Jianhui Zhou 1, Haiping Huang 3

0 citations

α

Published on arXiv

2509.01235

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Layer-wise geometry-aware training that enforces a calibrated ratio of intra-class to inter-class distances produces smooth feature manifolds with improved adversarial robustness against both white-box and black-box attacks while maintaining generalization accuracy.

GAL (Geometry-Aware Learning)

Novel technique introduced


How to balance training accuracy and adversarial robustness has become a challenge since the birth of deep learning. Here, we introduce a geometry-aware deep learning framework that leverages layer-wise local training to sculpt the internal representations of deep neural networks. This framework promotes intra-class compactness and inter-class separation in feature space, leading to manifold smoothness and adversarial robustness against white or black box attacks. The performance can be explained by an energy model with Hebbian coupling between elements of the hidden representation. Our results thus shed light on the physics of learning in the direction of alignment between biological and artificial intelligence systems. Using the current framework, the deep network can assimilate new information into existing knowledge structures while reducing representation interference.


Key Contributions

  • Geometry-Aware Learning (GAL) framework using layer-wise local training that promotes intra-class compactness and inter-class separation in feature space without end-to-end backpropagation
  • Theoretical explanation of adversarial robustness via a Hopfield/Hebbian energy model linking geometric representation structure to robustness
  • Demonstration that controlled feature space geometry simultaneously improves adversarial robustness and enables continual learning with reduced representation interference

🛡️ Threat Analysis

Input Manipulation Attack

The paper proposes a defense against adversarial examples (input manipulation attacks at inference time), specifically demonstrating robustness against both white-box and black-box attacks. The defense works by controlling the geometric structure of hidden representations during training to produce smoother manifolds that resist adversarial perturbations.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxblack_boxinference_time
Applications
image classification