Deepfake Detection Generalization with Diffusion Noise
Hongyuan Qi , Wenjin Hou , Hehe Fan , Jun Xiao
Published on arXiv
2604.14570
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Significantly improves ACC/AP on unseen diffusion-generated deepfakes with no inference overhead, outperforming existing detectors
ANL (Attention-guided Noise Learning)
Novel technique introduced
Deepfake detectors face growing challenges in generalization as new image synthesis techniques emerge. In particular, deepfakes generated by diffusion models are highly photorealistic and often evade detectors trained on GAN-based forgeries. This paper addresses the generalization problem in deepfake detection by leveraging diffusion noise characteristics. We propose an Attention-guided Noise Learning (ANL) framework that integrates a pre-trained diffusion model into the deepfake detection pipeline to guide the learning of more robust features. Specifically, our method uses the diffusion model's denoising process to expose subtle artifacts: the detector is trained to predict the noise contained in an input image at a given diffusion step, forcing it to capture discrepancies between real and synthetic images, while an attention-guided mechanism derived from the predicted noise is introduced to encourage the model to focus on globally distributed discrepancies rather than local patterns. By harnessing the frozen diffusion model's learned distribution of natural images, the ANL method acts as a form of regularization, improving the detector's generalization to unseen forgery types. Extensive experiments demonstrate that ANL significantly outperforms existing methods on multiple benchmarks, achieving state-of-the-art accuracy in detecting diffusion-generated deepfakes. Notably, the proposed framework boosts generalization performance (e.g., improving ACC/AP by a substantial margin on unseen models) without introducing additional overhead during inference. Our results highlight that diffusion noise provides a powerful signal for generalizable deepfake detection.
Key Contributions
- Attention-guided Noise Learning (ANL) framework that uses diffusion model denoising process to expose deepfake artifacts
- Novel training approach where detector predicts noise at diffusion steps to capture real vs synthetic discrepancies
- Achieves state-of-the-art generalization to unseen deepfake generation methods, especially diffusion-generated fakes
🛡️ Threat Analysis
The paper focuses on detecting AI-generated content (deepfakes) and verifying image authenticity — this is output integrity and content provenance. The detector distinguishes real images from synthetic/manipulated ones to ensure content authenticity.