defense 2025

Sharpness-Aware Geometric Defense for Robust Out-Of-Distribution Detection

Jeng-Lin Li 1, Ming-Ching Chang 2, Wei-Chao Chen 1

0 citations

α

Published on arXiv

2508.17174

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

SaGD significantly improves FPR and AUC over state-of-the-art defenses when distinguishing CIFAR-100 from six OOD datasets under various adversarial attacks.

SaGD (Sharpness-aware Geometric Defense)

Novel technique introduced


Out-of-distribution (OOD) detection ensures safe and reliable model deployment. Contemporary OOD algorithms using geometry projection can detect OOD or adversarial samples from clean in-distribution (ID) samples. However, this setting regards adversarial ID samples as OOD, leading to incorrect OOD predictions. Existing efforts on OOD detection with ID and OOD data under attacks are minimal. In this paper, we develop a robust OOD detection method that distinguishes adversarial ID samples from OOD ones. The sharp loss landscape created by adversarial training hinders model convergence, impacting the latent embedding quality for OOD score calculation. Therefore, we introduce a {\bf Sharpness-aware Geometric Defense (SaGD)} framework to smooth out the rugged adversarial loss landscape in the projected latent geometry. Enhanced geometric embedding convergence enables accurate ID data characterization, benefiting OOD detection against adversarial attacks. We use Jitter-based perturbation in adversarial training to extend the defense ability against unseen attacks. Our SaGD framework significantly improves FPR and AUC over the state-of-the-art defense approaches in differentiating CIFAR-100 from six other OOD datasets under various attacks. We further examine the effects of perturbations at various adversarial training levels, revealing the relationship between the sharp loss landscape and adversarial OOD detection.


Key Contributions

  • Identifies that adversarial training creates sharp loss landscapes that degrade latent embedding quality for OOD score calculation
  • Proposes SaGD framework combining sharpness-aware minimization with geometric projection to smooth adversarial loss landscapes and improve OOD detection under attack
  • Introduces Jitter-based perturbation in adversarial training to generalize defense to unseen attack types

🛡️ Threat Analysis

Input Manipulation Attack

The paper defends against adversarial input perturbations (inference-time evasion attacks) that cause OOD detectors to misclassify adversarial ID samples as OOD. The defense uses adversarial training with sharpness-aware minimization and Jitter-based perturbation to improve robustness of the latent geometry used for OOD scoring.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxinference_timedigitaluntargeted
Datasets
CIFAR-100
Applications
out-of-distribution detectionimage classification