Performance Decay in Deepfake Detection: The Limitations of Training on Outdated Data
Jack Richings , Margaux Leblanc , Ian Groves , Victoria Nockles
Published on arXiv
2511.07009
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Deepfake detectors trained on contemporary data suffer over 30% recall degradation within six months when evaluated against newer-generation deepfakes, with frame-level artifacts identified as the dominant signal.
The continually advancing quality of deepfake technology exacerbates the threats of disinformation, fraud, and harassment by making maliciously-generated synthetic content increasingly difficult to distinguish from reality. We introduce a simple yet effective two-stage detection method that achieves an AUROC of over 99.8% on contemporary deepfakes. However, this high performance is short-lived. We show that models trained on this data suffer a recall drop of over 30% when evaluated on deepfakes created with generation techniques from just six months later, demonstrating significant decay as threats evolve. Our analysis reveals two key insights for robust detection. Firstly, continued performance requires the ongoing curation of large, diverse datasets. Second, predictive power comes primarily from static, frame-level artifacts, not temporal inconsistencies. The future of effective deepfake detection therefore depends on rapid data collection and the development of advanced frame-level feature detectors.
Key Contributions
- Two-stage deepfake detection method achieving >99.8% AUROC on contemporary deepfakes
- Empirical demonstration that detection models suffer >30% recall drop within six months when evaluated on deepfakes created with newer generation techniques
- Finding that frame-level static artifacts carry more predictive power than temporal inconsistencies for robust deepfake detection
🛡️ Threat Analysis
Proposes a two-stage AI-generated video (deepfake) detection system and analyzes how its output-integrity verification degrades as generative techniques evolve — squarely within AI-generated content detection and output authenticity.