Fair and Interpretable Deepfake Detection in Videos

Existing deepfake detection methods often exhibit bias, lack transparency, and fail to capture temporal information, leading to biased decisions and unreliable results across different demographic groups. In this paper, we propose a fairness-aware deepfake detection framework that integrates temporal feature learning and demographic-aware data augmentation to enhance fairness and interpretability. Our method leverages sequence-based clustering for temporal modeling of deepfake videos and concept extraction to improve detection reliability while also facilitating interpretable decisions for non-expert users. Additionally, we introduce a demography-aware data augmentation method that balances underrepresented groups and applies frequency-domain transformations to preserve deepfake artifacts, thereby mitigating bias and improving generalization. Extensive experiments on FaceForensics++, DFD, Celeb-DF, and DFDC datasets using state-of-the-art (SoTA) architectures (Xception, ResNet) demonstrate the efficacy of the proposed method in obtaining the best tradeoff between fairness and accuracy when compared to SoTA.

Key Contributions

Sequence-based clustering for temporal modeling of deepfake artifacts across video frames
Concept extraction module providing human-interpretable explanations (Concept Sensitivity Scores) for deepfake decisions
Frequency-aware demographic data augmentation that preserves deepfake artifacts while balancing underrepresented demographic groups to reduce bias

🛡️ Threat Analysis

Output Integrity Attack

Proposes novel AI-generated content (deepfake video) detection architecture with fairness and interpretability improvements — novel detection techniques (temporal clustering, concept extraction, frequency-domain augmentation) rather than mere application of existing methods to a new domain.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

inference_time

Datasets

FaceForensics++DFDCeleb-DFDFDC

Applications

2025 0 cit.

Output Integrity Attack

100%