Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation Ecosystem
Neslihan Kose 1, Anthony Rhodes 1, Umur Aybars Ciftci 2, Ilke Demir 3
Published on arXiv
2509.17550
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Uncertainty manifolds from deepfake detectors contain consistent information sufficient for generator source attribution, and pixel-level uncertainty maps reveal distinct artifact patterns correlated with specific generative models.
Bayesian Uncertainty Quantification for Deepfake Detection
Novel technique introduced
As generative models are advancing in quality and quantity for creating synthetic content, deepfakes begin to cause online mistrust. Deepfake detectors are proposed to counter this effect, however, misuse of detectors claiming fake content as real or vice versa further fuels this misinformation problem. We present the first comprehensive uncertainty analysis of deepfake detectors, systematically investigating how generative artifacts influence prediction confidence. As reflected in detectors' responses, deepfake generators also contribute to this uncertainty as their generative residues vary, so we cross the uncertainty analysis of deepfake detectors and generators. Based on our observations, the uncertainty manifold holds enough consistent information to leverage uncertainty for deepfake source detection. Our approach leverages Bayesian Neural Networks and Monte Carlo dropout to quantify both aleatoric and epistemic uncertainties across diverse detector architectures. We evaluate uncertainty on two datasets with nine generators, with four blind and two biological detectors, compare different uncertainty methods, explore region- and pixel-based uncertainty, and conduct ablation studies. We conduct and analyze binary real/fake, multi-class real/fake, source detection, and leave-one-out experiments between the generator/detector combinations to share their generalization capability, model calibration, uncertainty, and robustness against adversarial attacks. We further introduce uncertainty maps that localize prediction confidence at the pixel level, revealing distinct patterns correlated with generator-specific artifacts. Our analysis provides critical insights for deploying reliable deepfake detection systems and establishes uncertainty quantification as a fundamental requirement for trustworthy synthetic media detection.
Key Contributions
- First comprehensive uncertainty analysis of deepfake detectors using Bayesian Neural Networks and Monte Carlo dropout to quantify aleatoric and epistemic uncertainties across diverse detector architectures
- Cross-analysis of nine generators with six detectors revealing how generative residues influence detector confidence, enabling uncertainty-based deepfake source attribution
- Pixel-level uncertainty maps that localize prediction confidence and expose generator-specific artifact patterns, with ablation studies on model calibration, generalization, and adversarial robustness
🛡️ Threat Analysis
Core contribution is analyzing and improving deepfake detection (AI-generated content authenticity), introducing uncertainty quantification as a new dimension for evaluating synthetic media detectors — directly addresses output integrity and content provenance.