Who Made This? Fake Detection and Source Attribution with Diffusion Features
Simone Bonechi 1, Paolo Andreini 1, Barbara Toniella Corradini 2
Published on arXiv
2510.27602
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
k-NN classifier applied to diffusion model internal activations achieves state-of-the-art cross-generator deepfake detection without fine-tuning, demonstrating that diffusion representations inherently encode generator-specific patterns.
FRIDA
Novel technique introduced
The rapid progress of generative diffusion models has enabled the creation of synthetic images that are increasingly difficult to distinguish from real ones, raising concerns about authenticity, copyright, and misinformation. Existing supervised detectors often struggle to generalize across unseen generators, requiring extensive labeled data and frequent retraining. We introduce FRIDA (Fake-image Recognition and source Identification via Diffusion-features Analysis), a lightweight framework that leverages internal activations from a pre-trained diffusion model for deepfake detection and source generator attribution. A k-nearest-neighbor classifier applied to diffusion features achieves state-of-the-art cross-generator performance without fine-tuning, while a compact neural model enables accurate source attribution. These results show that diffusion representations inherently encode generator-specific patterns, providing a simple and interpretable foundation for synthetic image forensics.
Key Contributions
- FRIDA framework that extracts image prototypes from Stable Diffusion U-Net internal activations for synthetic image forensics
- k-NN classifier on diffusion features achieves state-of-the-art cross-generator deepfake detection without any fine-tuning
- Compact MLP for source model attribution that identifies the specific generative model (GLIDE, Stable Diffusion, BigGAN, etc.) responsible for a fake image
🛡️ Threat Analysis
FRIDA is a deepfake detection and source attribution framework — directly addressing AI-generated content detection and content provenance, which are canonical ML09 concerns. The paper proposes a novel detection architecture (diffusion features + k-NN) rather than merely applying existing methods to a domain.