A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison

Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect unseen attacks or detect different attack types with a high level of accuracy. In this work, we propose a statistical approach that establishes a detection baseline before a neural network's deployment, enabling effective real-time adversarial detection. We generate a metric of adversarial presence by comparing the behavior of a compressed/uncompressed neural network pair. Our method has been tested against state-of-the-art techniques, and it achieves near-perfect detection across a wide range of attack types. Moreover, it significantly reduces false positives, making it both reliable and practical for real-world applications.

Key Contributions

Attack-agnostic adversarial detection method that requires no prior knowledge of attack type by comparing feature-layer distributions of a compressed/uncompressed CNN pair
A pre-deployment calibration procedure to establish per-class identity baselines and a runtime threshold T derived from KL divergence, L2 norm, and Mann-Whitney U statistics
Near-perfect detection across FGSM, PGD, Square Attack, DeepFool, and CW attacks with significantly reduced false positives compared to existing methods

🛡️ Threat Analysis

Input Manipulation Attack

The paper directly defends against input manipulation attacks (adversarial examples) by detecting adversarially perturbed inputs at inference time; it is evaluated against FGSM, PGD, Square Attack, DeepFool, and CW — all canonical adversarial example attacks.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxblack_boxinference_timeuntargeteddigital

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Deep learning models are vulnerable, but adversarial examples are even more vulnerable

GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients

DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense

ANCHOR: Integrating Adversarial Training with Hard-mined Supervised Contrastive Learning for Robust Representation Learning

Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles

Towards Strong Certified Defense with Universal Asymmetric Randomization

MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification

FMVP: Masked Flow Matching for Adversarial Video Purification