MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios
Changtao Miao 1, Yi Zhang 1, Man Luo 1, Weiwei Feng 1, Kaiyuan Zheng 1, Qi Chu 1,2, Tao Gong 1, Jianshu Li 1, Yunfeng Diao 1,3, Wei Zhou 4, Joey Tianyi Zhou 5, Xiaoshuai Hao 5,6
2 Anhui Province Key Laboratory of Digital Security
3 Hefei University of Technology
Published on arXiv
2509.05592
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
MFFI outperforms existing public datasets on scene complexity, cross-domain generalization, and detection difficulty gradients across benchmark evaluations.
MFFI
Novel technique introduced
Rapid advances in Artificial Intelligence Generated Content (AIGC) have enabled increasingly sophisticated face forgeries, posing a significant threat to social security. However, current Deepfake detection methods are limited by constraints in existing datasets, which lack the diversity necessary in real-world scenarios. Specifically, these data sets fall short in four key areas: unknown of advanced forgery techniques, variability of facial scenes, richness of real data, and degradation of real-world propagation. To address these challenges, we propose the Multi-dimensional Face Forgery Image (\textbf{MFFI}) dataset, tailored for real-world scenarios. MFFI enhances realism based on four strategic dimensions: 1) Wider Forgery Methods; 2) Varied Facial Scenes; 3) Diversified Authentic Data; 4) Multi-level Degradation Operations. MFFI integrates $50$ different forgery methods and contains $1024K$ image samples. Benchmark evaluations show that MFFI outperforms existing public datasets in terms of scene complexity, cross-domain generalization capability, and detection difficulty gradients. These results validate the technical advance and practical utility of MFFI in simulating real-world conditions. The dataset and additional details are publicly available at {https://github.com/inclusionConf/MFFI}.
Key Contributions
- MFFI dataset with 1,024K images covering 50 distinct face forgery methods across 6 major categories (face swapping, reenactment, synthesis, editing, super-resolution, manual photoshop)
- Four realism dimensions: wider forgery methods, varied facial scenes, diversified authentic data, and multi-level degradation simulating real-world propagation artifacts
- Benchmark evaluations demonstrating superior scene complexity, cross-domain generalization capability, and detection difficulty gradients compared to existing public datasets
🛡️ Threat Analysis
Directly addresses deepfake face forgery detection — an AI-generated content detection problem explicitly covered by ML09. The MFFI dataset is purpose-built to benchmark and improve detectors that verify output authenticity and detect AI-generated faces.