defense 2025

DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense

Amira Guesmi , Muhammad Shafique

0 citations · 24 references · arXiv

α

Published on arXiv

2509.24359

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

DRIFT outperforms preprocessing, adversarial training, and diffusion-based defenses under adaptive white-box, transfer-based, and gradient-free attacks on ImageNet with negligible runtime and memory cost.

DRIFT (Divergent Response in Filtered Transformations)

Novel technique introduced


Deep neural networks remain highly vulnerable to adversarial examples, and most defenses collapse once gradients can be reliably estimated. We identify \emph{gradient consensus} -- the tendency of randomized transformations to yield aligned gradients -- as a key driver of adversarial transferability. Attackers exploit this consensus to construct perturbations that remain effective across transformations. We introduce \textbf{DRIFT} (Divergent Response in Filtered Transformations), a stochastic ensemble of lightweight, learnable filters trained to actively disrupt gradient consensus. Unlike prior randomized defenses that rely on gradient masking, DRIFT enforces \emph{gradient dissonance} by maximizing divergence in Jacobian- and logit-space responses while preserving natural predictions. Our contributions are threefold: (i) we formalize gradient consensus and provide a theoretical analysis linking consensus to transferability; (ii) we propose a consensus-divergence training strategy combining prediction consistency, Jacobian separation, logit-space separation, and adversarial robustness; and (iii) we show that DRIFT achieves substantial robustness gains on ImageNet across CNNs and Vision Transformers, outperforming state-of-the-art preprocessing, adversarial training, and diffusion-based defenses under adaptive white-box, transfer-based, and gradient-free attacks. DRIFT delivers these improvements with negligible runtime and memory cost, establishing gradient divergence as a practical and generalizable principle for adversarial defense.


Key Contributions

  • Formalizes 'gradient consensus' in randomized transformations and provides theoretical analysis linking it to adversarial transferability
  • Proposes DRIFT: a consensus-divergence training strategy using Jacobian separation, logit-space separation, and prediction consistency to enforce gradient dissonance across stochastic filter ensembles
  • Demonstrates state-of-the-art robustness on ImageNet against adaptive white-box, transfer-based, and gradient-free attacks on both CNNs and Vision Transformers with negligible runtime overhead

🛡️ Threat Analysis

Input Manipulation Attack

DRIFT is a defense against adversarial examples at inference time — it counters gradient-based adversarial perturbations (white-box, transfer-based, gradient-free attacks) by enforcing gradient dissonance across a stochastic ensemble of learnable filters, directly targeting input manipulation attacks.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxblack_boxinference_timeuntargeteddigital
Datasets
ImageNet
Applications
image classification