attack 2025

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector

0 citations · 33 references · IEEE Signal Processing Letters

Published on arXiv

2511.15571

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

DuFIA consistently outperforms existing traditional and intermediate-level adversarial attacks in black-box transfer settings across diverse AIGI detectors including CNN-based and CLIP-based architectures

DuFIA (Dual-domain Feature Importance Attack)

Novel technique introduced

Recent AI-generated image (AIGI) detectors achieve impressive accuracy under clean condition. In view of antiforensics, it is significant to develop advanced adversarial attacks for evaluating the security of such detectors, which remains unexplored sufficiently. This letter proposes a Dual-domain Feature Importance Attack (DuFIA) scheme to invalidate AIGI detectors to some extent. Forensically important features are captured by the spatially interpolated gradient and frequency-aware perturbation. The adversarial transferability is enhanced by jointly modeling spatial and frequency-domain feature importances, which are fused to guide the optimization-based adversarial example generation. Extensive experiments across various AIGI detectors verify the cross-model transferability, transparency and robustness of DuFIA.

Key Contributions

Dual-domain feature importance attack (DuFIA) that jointly leverages spatial (interpolated gradient) and frequency-domain perturbations to generate transferable adversarial examples against AIGI detectors
Domain-aware fusion mechanism that combines spatial and frequency feature importance maps to selectively emphasize semantically critical and cross-model-transferable features
Demonstrated superior cross-model black-box transferability over existing traditional and intermediate-level attack (ILA) methods across multiple AIGI detector architectures

🛡️ Threat Analysis

Input Manipulation Attack

DuFIA crafts optimization-based adversarial perturbations using spatially interpolated gradients and frequency-aware perturbations to cause AIGI detectors to misclassify AI-generated images as real at inference time — a direct adversarial evasion attack. The core contribution is the novel dual-domain feature importance mechanism for generating transferable adversarial examples, which is the defining characteristic of ML01.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxblack_boxinference_timedigital

Applications

ai-generated image detectiondeepfake detection

Read PDF arXiv DOI Code

Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AdvSplat: Adversarial Attacks on Feed-Forward Gaussian Splatting Models

SegTrans: Transferable Adversarial Examples for Segmentation Models

NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability

Enhancing Adversarial Transferability through Block Stretch and Shrink

Optimizing the Adversarial Perturbation with a Momentum-based Adaptive Matrix

Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity