Conditional Uncertainty-Aware Political Deepfake Detection with Stochastic Convolutional Neural Networks
Published on arXiv
2602.10343
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Calibrated probabilistic outputs and MC dropout uncertainty estimates enable risk-aware moderation policies, with confidence-band analysis showing uncertainty adds operational value primarily in ambiguous, mid-confidence prediction regions.
Recent advances in generative image models have enabled the creation of highly realistic political deepfakes, posing risks to information integrity, public trust, and democratic processes. While automated deepfake detectors are increasingly deployed in moderation and investigative pipelines, most existing systems provide only point predictions and fail to indicate when outputs are unreliable, being an operationally critical limitation in high-stakes political contexts. This work investigates conditional, uncertainty-aware political deepfake detection using stochastic convolutional neural networks within an empirical, decision-oriented reliability framework. Rather than treating uncertainty as a purely Bayesian construct, it is evaluated through observable criteria, including calibration quality, proper scoring rules, and its alignment with prediction errors under both global and confidence-conditioned analyses. A politically focused binary image dataset is constructed via deterministic metadata filtering from a large public real-synthetic corpus. Two pretrained CNN backbones (ResNet-18 and EfficientNet-B4) are fully fine-tuned for classification. Deterministic inference is compared with single-pass stochastic prediction, Monte Carlo dropout with multiple forward passes, temperature scaling, and ensemble-based uncertainty surrogates. Evaluation reports ROC-AUC, thresholded confusion matrices, calibration metrics, and generator-disjoint out-of-distribution performance. Results demonstrate that calibrated probabilistic outputs and uncertainty estimates enable risk-aware moderation policies. A systematic confidence-band analysis further clarifies when uncertainty provides operational value beyond predicted confidence, delineating both the benefits and limitations of uncertainty-aware deepfake detection in political settings.
Key Contributions
- Politically-focused binary real/synthetic image dataset constructed via deterministic metadata filtering from a large public real-synthetic corpus
- Comparative evaluation of deterministic, MC dropout, single-pass stochastic, temperature scaling, and ensemble uncertainty methods on fine-tuned ResNet-18 and EfficientNet-B4 backbones
- Systematic confidence-band analysis delineating when uncertainty estimates provide operational value beyond predicted confidence for risk-aware moderation policies
🛡️ Threat Analysis
Primary contribution is detecting AI-generated synthetic political imagery (deepfakes) — direct instantiation of AI-generated content detection under ML09 output integrity. The uncertainty quantification framework is evaluated specifically to improve reliability of deepfake detection in moderation pipelines.