Age-Diverse Deepfake Dataset: Bridging the Age Gap in Deepfake Detection
Published on arXiv
2508.06552
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Models trained on the age-diverse dataset demonstrated fairer performance across all age groups and higher cross-dataset generalization compared to models trained on the original demographically skewed datasets.
The challenges associated with deepfake detection are increasing significantly with the latest advancements in technology and the growing popularity of deepfake videos and images. Despite the presence of numerous detection models, demographic bias in the deepfake dataset remains largely unaddressed. This paper focuses on the mitigation of age-specific bias in the deepfake dataset by introducing an age-diverse deepfake dataset that will improve fairness across age groups. The dataset is constructed through a modular pipeline incorporating the existing deepfake datasets Celeb-DF, FaceForensics++, and UTKFace datasets, and the creation of synthetic data to fill the age distribution gaps. The effectiveness and generalizability of this dataset are evaluated using three deepfake detection models: XceptionNet, EfficientNet, and LipForensics. Evaluation metrics, including AUC, pAUC, and EER, revealed that models trained on the age-diverse dataset demonstrated fairer performance across age groups, improved overall accuracy, and higher generalization across datasets. This study contributes a reproducible, fairness-aware deepfake dataset and model pipeline that can serve as a foundation for future research in fairer deepfake detection. The complete dataset and implementation code are available at https://github.com/unishajoshi/age-diverse-deepfake-detection.
Key Contributions
- Age-level annotations for FaceForensics++ and Celeb-DF, plus synthetic data generation (InsightFace + SimSwap) to fill underrepresented age groups
- Modular pipeline combining Celeb-DF, FaceForensics++, and UTKFace into a reproducible, fairness-aware age-diverse deepfake dataset
- Empirical evaluation showing models trained on the age-diverse dataset achieve fairer AUC/pAUC/EER across age cohorts compared to source datasets
🛡️ Threat Analysis
Deepfake detection is the central task — the paper constructs a dataset and evaluates AI-generated content detectors (XceptionNet, EfficientNet, LipForensics), directly targeting the output integrity and authenticity problem of AI-synthesized faces.