Fair and Interpretable Deepfake Detection in Videos
Akihito Yoshii 1, Ryosuke Sonoda 1, Ramya Srinivasan 2
Published on arXiv
2510.17264
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Proposed framework achieves the best fairness-accuracy tradeoff compared to state-of-the-art methods across four deepfake video benchmarks using Xception and ResNet architectures
Existing deepfake detection methods often exhibit bias, lack transparency, and fail to capture temporal information, leading to biased decisions and unreliable results across different demographic groups. In this paper, we propose a fairness-aware deepfake detection framework that integrates temporal feature learning and demographic-aware data augmentation to enhance fairness and interpretability. Our method leverages sequence-based clustering for temporal modeling of deepfake videos and concept extraction to improve detection reliability while also facilitating interpretable decisions for non-expert users. Additionally, we introduce a demography-aware data augmentation method that balances underrepresented groups and applies frequency-domain transformations to preserve deepfake artifacts, thereby mitigating bias and improving generalization. Extensive experiments on FaceForensics++, DFD, Celeb-DF, and DFDC datasets using state-of-the-art (SoTA) architectures (Xception, ResNet) demonstrate the efficacy of the proposed method in obtaining the best tradeoff between fairness and accuracy when compared to SoTA.
Key Contributions
- Sequence-based clustering for temporal modeling of deepfake artifacts across video frames
- Concept extraction module providing human-interpretable explanations (Concept Sensitivity Scores) for deepfake decisions
- Frequency-aware demographic data augmentation that preserves deepfake artifacts while balancing underrepresented demographic groups to reduce bias
🛡️ Threat Analysis
Proposes novel AI-generated content (deepfake video) detection architecture with fairness and interpretability improvements — novel detection techniques (temporal clustering, concept extraction, frequency-domain augmentation) rather than mere application of existing methods to a new domain.