Sharpness-Aware Geometric Defense for Robust Out-Of-Distribution Detection

Out-of-distribution (OOD) detection ensures safe and reliable model deployment. Contemporary OOD algorithms using geometry projection can detect OOD or adversarial samples from clean in-distribution (ID) samples. However, this setting regards adversarial ID samples as OOD, leading to incorrect OOD predictions. Existing efforts on OOD detection with ID and OOD data under attacks are minimal. In this paper, we develop a robust OOD detection method that distinguishes adversarial ID samples from OOD ones. The sharp loss landscape created by adversarial training hinders model convergence, impacting the latent embedding quality for OOD score calculation. Therefore, we introduce a {\bf Sharpness-aware Geometric Defense (SaGD)} framework to smooth out the rugged adversarial loss landscape in the projected latent geometry. Enhanced geometric embedding convergence enables accurate ID data characterization, benefiting OOD detection against adversarial attacks. We use Jitter-based perturbation in adversarial training to extend the defense ability against unseen attacks. Our SaGD framework significantly improves FPR and AUC over the state-of-the-art defense approaches in differentiating CIFAR-100 from six other OOD datasets under various attacks. We further examine the effects of perturbations at various adversarial training levels, revealing the relationship between the sharp loss landscape and adversarial OOD detection.

Key Contributions

Identifies that adversarial training creates sharp loss landscapes that degrade latent embedding quality for OOD score calculation
Proposes SaGD framework combining sharpness-aware minimization with geometric projection to smooth adversarial loss landscapes and improve OOD detection under attack
Introduces Jitter-based perturbation in adversarial training to generalize defense to unseen attack types

🛡️ Threat Analysis

Input Manipulation Attack

The paper defends against adversarial input perturbations (inference-time evasion attacks) that cause OOD detectors to misclassify adversarial ID samples as OOD. The defense uses adversarial training with sharpness-aware minimization and Jitter-based perturbation to improve robustness of the latent geometry used for OOD scoring.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxinference_timedigitaluntargeted

Datasets

CIFAR-100

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

NeuroShield: A Neuro-Symbolic Framework for Adversarial Robustness

A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion

Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness

Quadratic Upper Bound for Boosting Robustness

Divided We Fall: Defending Against Adversarial Attacks via Soft-Gated Fractional Mixture-of-Experts with Randomized Adversarial Training

Contrastive ECOC: Learning Output Codes for Adversarial Defense

Robust Spiking Neural Networks Against Adversarial Attacks

DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks