benchmark 2025

When Flatness Does (Not) Guarantee Adversarial Robustness

Nils Philipp Walter 1, Linara Adilova 2, Jilles Vreeken 1, Michael Kamp 3,4,5

3 citations · 82 references · arXiv

α

Published on arXiv

2510.14231

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Flatness in the loss landscape guarantees only local adversarial robustness; achieving global robustness requires sharp curvature away from the data manifold, and adversarial examples systematically occupy flat, high-confidence error regions.


Despite their empirical success, neural networks remain vulnerable to small, adversarial perturbations. A longstanding hypothesis suggests that flat minima, regions of low curvature in the loss landscape, offer increased robustness. While intuitive, this connection has remained largely informal and incomplete. By rigorously formalizing the relationship, we show this intuition is only partially correct: flatness implies local but not global adversarial robustness. To arrive at this result, we first derive a closed-form expression for relative flatness in the penultimate layer, and then show we can use this to constrain the variation of the loss in input space. This allows us to formally analyze the adversarial robustness of the entire network. We then show that to maintain robustness beyond a local neighborhood, the loss needs to curve sharply away from the data manifold. We validate our theoretical predictions empirically across architectures and datasets, uncovering the geometric structure that governs adversarial vulnerability, and linking flatness to model confidence: adversarial examples often lie in large, flat regions where the model is confidently wrong. Our results challenge simplified views of flatness and provide a nuanced understanding of its role in robustness.


Key Contributions

  • Formal proof that flatness implies local but not global adversarial robustness, resolving a longstanding informal hypothesis
  • Closed-form expression for relative flatness in the penultimate layer, used to bound loss variation in input space
  • Empirical demonstration that adversarial examples lie in large flat regions where the model is confidently wrong, linking loss geometry to adversarial vulnerability

🛡️ Threat Analysis

Input Manipulation Attack

The paper's primary contribution is a theoretical and empirical analysis of adversarial robustness — specifically characterizing the geometric conditions under which adversarial perturbations can fool neural networks, and showing adversarial examples lie in flat regions of high model confidence. This directly addresses the vulnerability to input manipulation attacks.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxinference_timedigital
Applications
image classification