DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense

Deep neural networks remain highly vulnerable to adversarial examples, and most defenses collapse once gradients can be reliably estimated. We identify \emph{gradient consensus} -- the tendency of randomized transformations to yield aligned gradients -- as a key driver of adversarial transferability. Attackers exploit this consensus to construct perturbations that remain effective across transformations. We introduce \textbf{DRIFT} (Divergent Response in Filtered Transformations), a stochastic ensemble of lightweight, learnable filters trained to actively disrupt gradient consensus. Unlike prior randomized defenses that rely on gradient masking, DRIFT enforces \emph{gradient dissonance} by maximizing divergence in Jacobian- and logit-space responses while preserving natural predictions. Our contributions are threefold: (i) we formalize gradient consensus and provide a theoretical analysis linking consensus to transferability; (ii) we propose a consensus-divergence training strategy combining prediction consistency, Jacobian separation, logit-space separation, and adversarial robustness; and (iii) we show that DRIFT achieves substantial robustness gains on ImageNet across CNNs and Vision Transformers, outperforming state-of-the-art preprocessing, adversarial training, and diffusion-based defenses under adaptive white-box, transfer-based, and gradient-free attacks. DRIFT delivers these improvements with negligible runtime and memory cost, establishing gradient divergence as a practical and generalizable principle for adversarial defense.

Key Contributions

Formalizes 'gradient consensus' in randomized transformations and provides theoretical analysis linking it to adversarial transferability
Proposes DRIFT: a consensus-divergence training strategy using Jacobian separation, logit-space separation, and prediction consistency to enforce gradient dissonance across stochastic filter ensembles
Demonstrates state-of-the-art robustness on ImageNet against adaptive white-box, transfer-based, and gradient-free attacks on both CNNs and Vision Transformers with negligible runtime overhead

🛡️ Threat Analysis

Input Manipulation Attack

DRIFT is a defense against adversarial examples at inference time — it counters gradient-based adversarial perturbations (white-box, transfer-based, gradient-free attacks) by enforcing gradient dissonance across a stochastic ensemble of learnable filters, directly targeting input manipulation attacks.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxblack_boxinference_timeuntargeteddigital

Datasets

ImageNet

Applications

2025 1 cit.

Input Manipulation Attack

92%