benchmark 2025

MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios

Changtao Miao ¹, Yi Zhang ¹, Man Luo ¹, Weiwei Feng ¹, Kaiyuan Zheng ¹, Qi Chu ^1,2, Tao Gong ¹, Jianshu Li ¹, Yunfeng Diao ^1,3, Wei Zhou ⁴, Joey Tianyi Zhou ⁵, Xiaoshuai Hao ^5,6

¹ Ant Group

² Anhui Province Key Laboratory of Digital Security

³ Hefei University of Technology

⁴ Cardiff University

⁵ Agency for Science, Technology and Research

⁶ Beijing Academy of Artificial Intelligence

0 citations

Published on arXiv

2509.05592

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

MFFI outperforms existing public datasets on scene complexity, cross-domain generalization, and detection difficulty gradients across benchmark evaluations.

MFFI

Novel technique introduced

Rapid advances in Artificial Intelligence Generated Content (AIGC) have enabled increasingly sophisticated face forgeries, posing a significant threat to social security. However, current Deepfake detection methods are limited by constraints in existing datasets, which lack the diversity necessary in real-world scenarios. Specifically, these data sets fall short in four key areas: unknown of advanced forgery techniques, variability of facial scenes, richness of real data, and degradation of real-world propagation. To address these challenges, we propose the Multi-dimensional Face Forgery Image (\textbf{MFFI}) dataset, tailored for real-world scenarios. MFFI enhances realism based on four strategic dimensions: 1) Wider Forgery Methods; 2) Varied Facial Scenes; 3) Diversified Authentic Data; 4) Multi-level Degradation Operations. MFFI integrates $50$ different forgery methods and contains $1024K$ image samples. Benchmark evaluations show that MFFI outperforms existing public datasets in terms of scene complexity, cross-domain generalization capability, and detection difficulty gradients. These results validate the technical advance and practical utility of MFFI in simulating real-world conditions. The dataset and additional details are publicly available at {https://github.com/inclusionConf/MFFI}.

Key Contributions

MFFI dataset with 1,024K images covering 50 distinct face forgery methods across 6 major categories (face swapping, reenactment, synthesis, editing, super-resolution, manual photoshop)
Four realism dimensions: wider forgery methods, varied facial scenes, diversified authentic data, and multi-level degradation simulating real-world propagation artifacts
Benchmark evaluations demonstrating superior scene complexity, cross-domain generalization capability, and detection difficulty gradients compared to existing public datasets

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses deepfake face forgery detection — an AI-generated content detection problem explicitly covered by ML09. The MFFI dataset is purpose-built to benchmark and improve detectors that verify output authenticity and detect AI-generated faces.

Details

Domains

visiongenerative

Model Types

gandiffusion

Threat Tags

digitalinference_time

Datasets

MFFIFaceForensics++CelebDFDF40

Applications

face forgery detectiondeepfake detection

Read PDF arXiv Code

MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

RealHD: A High-Quality Dataset for Robust Detection of State-of-the-Art AI-Generated Images

FakeParts: a New Family of AI-Generated DeepFakes

The Orthogonal Vulnerabilities of Generative AI Watermarks: A Comparative Empirical Benchmark of Spatial and Latent Provenance

Your One-Stop Solution for AI-Generated Video Detection

UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization

DiffFace-Edit: A Diffusion-Based Facial Dataset for Forgery-Semantic Driven Deepfake Detection Analysis

Deepfake Synthesis vs. Detection: An Uneven Contest

Além do Desempenho: Um Estudo da Confiabilidade de Detectores de Deepfakes