Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles

We propose a test-time defense mechanism against adversarial attacks: imperceptible image perturbations that significantly alter the predictions of a model. Unlike existing methods that rely on feature filtering or smoothing, which can lead to information loss, we propose to "combat noise with noise" by leveraging stochastic resonance to enhance robustness while minimizing information loss. Our approach introduces small translational perturbations to the input image, aligns the transformed feature embeddings, and aggregates them before mapping back to the original reference image. This can be expressed in a closed-form formula, which can be deployed on diverse existing network architectures without introducing additional network modules or fine-tuning for specific attack types. The resulting method is entirely training-free, architecture-agnostic, and attack-agnostic. Empirical results show state-of-the-art robustness on image classification and, for the first time, establish a generic test-time defense for dense prediction tasks, including stereo matching and optical flow, highlighting the method's versatility and practicality. Specifically, relative to clean (unperturbed) performance, our method recovers up to 68.1% of the accuracy loss on image classification, 71.9% on stereo matching, and 29.2% on optical flow under various types of adversarial attacks.

Key Contributions

Training-free, architecture-agnostic test-time defense via stochastic resonance: small translational perturbations are applied to inputs, embeddings are aligned and aggregated in latent space, expressed in a closed-form formula with no additional modules or fine-tuning
First generic test-time adversarial defense demonstrated on dense prediction tasks (stereo matching and optical flow) in addition to image classification
Recovers up to 68.1% of accuracy loss on image classification, 71.9% on stereo matching, and 29.2% on optical flow across diverse attack types including adaptive attacks

🛡️ Threat Analysis

Input Manipulation Attack

The paper directly proposes and evaluates a defense against adversarial examples — gradient-based imperceptible input perturbations that cause misclassification or wrong dense predictions at inference time. The threat model is canonical ML01: FGSM, PGD, universal perturbations, and adaptive attacks targeting image classifiers and dense prediction networks.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxblack_boxinference_timedigitaluntargeted

Applications

2025 0 cit.

Input Manipulation Attack

92%