Deepfake Detection Generalization with Diffusion Noise

Deepfake detectors face growing challenges in generalization as new image synthesis techniques emerge. In particular, deepfakes generated by diffusion models are highly photorealistic and often evade detectors trained on GAN-based forgeries. This paper addresses the generalization problem in deepfake detection by leveraging diffusion noise characteristics. We propose an Attention-guided Noise Learning (ANL) framework that integrates a pre-trained diffusion model into the deepfake detection pipeline to guide the learning of more robust features. Specifically, our method uses the diffusion model's denoising process to expose subtle artifacts: the detector is trained to predict the noise contained in an input image at a given diffusion step, forcing it to capture discrepancies between real and synthetic images, while an attention-guided mechanism derived from the predicted noise is introduced to encourage the model to focus on globally distributed discrepancies rather than local patterns. By harnessing the frozen diffusion model's learned distribution of natural images, the ANL method acts as a form of regularization, improving the detector's generalization to unseen forgery types. Extensive experiments demonstrate that ANL significantly outperforms existing methods on multiple benchmarks, achieving state-of-the-art accuracy in detecting diffusion-generated deepfakes. Notably, the proposed framework boosts generalization performance (e.g., improving ACC/AP by a substantial margin on unseen models) without introducing additional overhead during inference. Our results highlight that diffusion noise provides a powerful signal for generalizable deepfake detection.

Key Contributions

Attention-guided Noise Learning (ANL) framework that uses diffusion model denoising process to expose deepfake artifacts
Novel training approach where detector predicts noise at diffusion steps to capture real vs synthetic discrepancies
Achieves state-of-the-art generalization to unseen deepfake generation methods, especially diffusion-generated fakes

🛡️ Threat Analysis

Output Integrity Attack

The paper focuses on detecting AI-generated content (deepfakes) and verifying image authenticity — this is output integrity and content provenance. The detector distinguishes real images from synthetic/manipulated ones to ensure content authenticity.

Details

Domains

visiongenerative

Model Types

diffusioncnn

Threat Tags

inference_timedigital

Applications

2026 0 cit.

Output Integrity Attack

92%