defense 2025

Deep learning models are vulnerable, but adversarial examples are even more vulnerable

Jun Li 1,2, Yanwei Xu 1,2, Keran Li 1, Xiaoli Zhang 3

0 citations · 46 references · arXiv

α

Published on arXiv

2511.05073

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

SWM-AED achieves over 62% detection accuracy in most settings and up to 96.5% across nine canonical attacks (FGSM, PGD, etc.) on CIFAR-10 without adversarial retraining.

SWM-AED (Sliding Window Mask-based Adversarial Example Detection)

Novel technique introduced


Understanding intrinsic differences between adversarial examples and clean samples is key to enhancing DNN robustness and detection against adversarial attacks. This study first empirically finds that image-based adversarial examples are notably sensitive to occlusion. Controlled experiments on CIFAR-10 used nine canonical attacks (e.g., FGSM, PGD) to generate adversarial examples, paired with original samples for evaluation. We introduce Sliding Mask Confidence Entropy (SMCE) to quantify model confidence fluctuation under occlusion. Using 1800+ test images, SMCE calculations supported by Mask Entropy Field Maps and statistical distributions show adversarial examples have significantly higher confidence volatility under occlusion than originals. Based on this, we propose Sliding Window Mask-based Adversarial Example Detection (SWM-AED), which avoids catastrophic overfitting of conventional adversarial training. Evaluations across classifiers and attacks on CIFAR-10 demonstrate robust performance, with accuracy over 62% in most cases and up to 96.5%.


Key Contributions

  • Empirical discovery that adversarial examples exhibit significantly higher confidence volatility under local occlusion (sliding mask) compared to clean samples
  • Introduction of Sliding Mask Confidence Entropy (SMCE) as a principled metric to quantify adversarial sensitivity to occlusion, visualized via Mask Entropy Field Maps
  • SWM-AED detection algorithm that identifies adversarial inputs using SMCE without requiring adversarial retraining, avoiding catastrophic overfitting

🛡️ Threat Analysis

Input Manipulation Attack

Paper's primary contribution is a defense against adversarial input manipulation — it detects adversarial examples (crafted by FGSM, PGD, and seven other attacks) using a novel occlusion-based confidence entropy metric (SMCE) and the SWM-AED detection algorithm, targeting inference-time evasion attacks on image classifiers.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxblack_boxinference_timedigitaluntargeted
Datasets
CIFAR-10
Applications
image classification