Attack-Aware Deepfake Detection under Counter-Forensic Manipulations
Noor Fatima 1, Hasan Faraz Khan 1, Muzammil Behzad 1,2
1 King Fahd University of Petroleum and Minerals
2 SDAIA-KFUPM Joint Research Center for Artificial Intelligence
Published on arXiv
2512.22303
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves near-perfect ranking across counter-forensic attack families with consistently low calibration error and minimal abstention risk; regrain emerges as the hardest stressor but remains controlled by the combined training-and-defense regimen
This work presents an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment conditions. The method combines red-team training with randomized test-time defense in a two-stream architecture, where one stream encodes semantic content using a pretrained backbone and the other extracts forensic residuals, fused via a lightweight residual adapter for classification, while a shallow Feature Pyramid Network style head produces tamper heatmaps under weak supervision. Red-team training applies worst-of-K counter-forensics per batch, including JPEG realign and recompress, resampling warps, denoise-to-regrain operations, seam smoothing, small color and gamma shifts, and social-app transcodes, while test-time defense injects low-cost jitters such as resize and crop phase changes, mild gamma variation, and JPEG phase shifts with aggregated predictions. Heatmaps are guided to concentrate within face regions using face-box masks without strict pixel-level annotations. Evaluation on existing benchmarks, including standard deepfake datasets and a surveillance-style split with low light and heavy compression, reports clean and attacked performance, AUC, worst-case accuracy, reliability, abstention quality, and weak-localization scores. Results demonstrate near-perfect ranking across attacks, low calibration error, minimal abstention risk, and controlled degradation under regrain, establishing a modular, data-efficient, and practically deployable baseline for attack-aware detection with calibrated probabilities and actionable heatmaps.
Key Contributions
- Two-stream architecture fusing semantic content (pretrained backbone) and forensic residuals via a lightweight residual adapter, with an FPN-style head producing weakly supervised tamper heatmaps guided by face-box masks
- Red-team training with worst-of-K counter-forensic augmentation (JPEG realign/recompress, resampling warps, denoise-to-regrain, seam smoothing, color/gamma shifts, social-app transcodes) to harden the detector against realistic field manipulations
- Randomized test-time defense with low-cost jitters (resize/crop phase, mild gamma, JPEG phase shifts) and prediction aggregation, paired with calibration and abstention diagnostics for reliable deployment
🛡️ Threat Analysis
Primary contribution is a novel deepfake detection system (AI-generated content detection) explicitly designed to withstand counter-forensic manipulations (JPEG recompression, resampling, denoise-to-regrain, social-app transcodes) that degrade or fool the detector, along with weakly supervised tamper heatmaps for output interpretability — squarely within output integrity and AI-generated content detection.