attack 2025

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector

Weiheng Zhu 1,2, Gang Cao 1,2, Jing Liu 3, Lifang Yu 4, Shaowei Weng 5

0 citations · 33 references · IEEE Signal Processing Letters

α

Published on arXiv

2511.15571

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

DuFIA consistently outperforms existing traditional and intermediate-level adversarial attacks in black-box transfer settings across diverse AIGI detectors including CNN-based and CLIP-based architectures

DuFIA (Dual-domain Feature Importance Attack)

Novel technique introduced


Recent AI-generated image (AIGI) detectors achieve impressive accuracy under clean condition. In view of antiforensics, it is significant to develop advanced adversarial attacks for evaluating the security of such detectors, which remains unexplored sufficiently. This letter proposes a Dual-domain Feature Importance Attack (DuFIA) scheme to invalidate AIGI detectors to some extent. Forensically important features are captured by the spatially interpolated gradient and frequency-aware perturbation. The adversarial transferability is enhanced by jointly modeling spatial and frequency-domain feature importances, which are fused to guide the optimization-based adversarial example generation. Extensive experiments across various AIGI detectors verify the cross-model transferability, transparency and robustness of DuFIA.


Key Contributions

  • Dual-domain feature importance attack (DuFIA) that jointly leverages spatial (interpolated gradient) and frequency-domain perturbations to generate transferable adversarial examples against AIGI detectors
  • Domain-aware fusion mechanism that combines spatial and frequency feature importance maps to selectively emphasize semantically critical and cross-model-transferable features
  • Demonstrated superior cross-model black-box transferability over existing traditional and intermediate-level attack (ILA) methods across multiple AIGI detector architectures

🛡️ Threat Analysis

Input Manipulation Attack

DuFIA crafts optimization-based adversarial perturbations using spatially interpolated gradients and frequency-aware perturbations to cause AIGI detectors to misclassify AI-generated images as real at inference time — a direct adversarial evasion attack. The core contribution is the novel dual-domain feature importance mechanism for generating transferable adversarial examples, which is the defining characteristic of ML01.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxblack_boxinference_timedigital
Applications
ai-generated image detectiondeepfake detection