Conditional Uncertainty-Aware Political Deepfake Detection with Stochastic Convolutional Neural Networks

Recent advances in generative image models have enabled the creation of highly realistic political deepfakes, posing risks to information integrity, public trust, and democratic processes. While automated deepfake detectors are increasingly deployed in moderation and investigative pipelines, most existing systems provide only point predictions and fail to indicate when outputs are unreliable, being an operationally critical limitation in high-stakes political contexts. This work investigates conditional, uncertainty-aware political deepfake detection using stochastic convolutional neural networks within an empirical, decision-oriented reliability framework. Rather than treating uncertainty as a purely Bayesian construct, it is evaluated through observable criteria, including calibration quality, proper scoring rules, and its alignment with prediction errors under both global and confidence-conditioned analyses. A politically focused binary image dataset is constructed via deterministic metadata filtering from a large public real-synthetic corpus. Two pretrained CNN backbones (ResNet-18 and EfficientNet-B4) are fully fine-tuned for classification. Deterministic inference is compared with single-pass stochastic prediction, Monte Carlo dropout with multiple forward passes, temperature scaling, and ensemble-based uncertainty surrogates. Evaluation reports ROC-AUC, thresholded confusion matrices, calibration metrics, and generator-disjoint out-of-distribution performance. Results demonstrate that calibrated probabilistic outputs and uncertainty estimates enable risk-aware moderation policies. A systematic confidence-band analysis further clarifies when uncertainty provides operational value beyond predicted confidence, delineating both the benefits and limitations of uncertainty-aware deepfake detection in political settings.

Key Contributions

Politically-focused binary real/synthetic image dataset constructed via deterministic metadata filtering from a large public real-synthetic corpus
Comparative evaluation of deterministic, MC dropout, single-pass stochastic, temperature scaling, and ensemble uncertainty methods on fine-tuned ResNet-18 and EfficientNet-B4 backbones
Systematic confidence-band analysis delineating when uncertainty estimates provide operational value beyond predicted confidence for risk-aware moderation policies

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is detecting AI-generated synthetic political imagery (deepfakes) — direct instantiation of AI-generated content detection under ML09 output integrity. The uncertainty quantification framework is evaluated specifically to improve reliability of deepfake detection in moderation pipelines.