Robust and Calibrated Detection of Authentic Multimedia Content
Sarim Hashmi , Abdelrahman Elsayed , Mohammed Talha Alam , Samuele Poppi , Nils Lukas
Published on arXiv
2512.15182
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Resynthesis-based detection achieves the lowest controllable false positive rate among compared methods while remaining robust against adaptive adversarial evasion attacks under matched compute budgets.
Calibrated Resynthesis Framework
Novel technique introduced
Generative models can synthesize highly realistic content, so-called deepfakes, that are already being misused at scale to undermine digital media authenticity. Current deepfake detection methods are unreliable for two reasons: (i) distinguishing inauthentic content post-hoc is often impossible (e.g., with memorized samples), leading to an unbounded false positive rate (FPR); and (ii) detection lacks robustness, as adversaries can adapt to known detectors with near-perfect accuracy using minimal computational resources. To address these limitations, we propose a resynthesis framework to determine if a sample is authentic or if its authenticity can be plausibly denied. We make two key contributions focusing on the high-precision, low-recall setting against efficient (i.e., compute-restricted) adversaries. First, we demonstrate that our calibrated resynthesis method is the most reliable approach for verifying authentic samples while maintaining controllable, low FPRs. Second, we show that our method achieves adversarial robustness against efficient adversaries, whereas prior methods are easily evaded under identical compute budgets. Our approach supports multiple modalities and leverages state-of-the-art inversion techniques.
Key Contributions
- Calibrated resynthesis framework that determines if a sample is authentic or its authenticity is plausibly deniable, with controllable and low false positive rates
- Demonstrated adversarial robustness against compute-restricted (efficient) adaptive adversaries, outperforming prior detectors under identical compute budgets
- Multi-modal support leveraging state-of-the-art inversion techniques for content authenticity verification
🛡️ Threat Analysis
Core contribution is a novel AI-generated content detection framework (deepfake detection across modalities) with a focus on content authenticity verification. The resynthesis method establishes whether a sample's authenticity can be plausibly denied, directly addressing output integrity. The adversarial robustness demonstrated (resisting adversaries who craft deepfakes to evade detection) is a property of this detection system, not a separate ML01 contribution.