Face2Parts: Exploring Coarse-to-Fine Inter-Regional Facial Dependencies for Generalized Deepfake Detection

Multimedia data, particularly images and videos, is integral to various applications, including surveillance, visual interaction, biometrics, evidence gathering, and advertising. However, amateur or skilled counterfeiters can simulate them to create deepfakes, often for slanderous motives. To address this challenge, several forensic methods have been developed to ensure the authenticity of the content. The effectiveness of these methods depends on their focus, with challenges arising from the diverse nature of manipulations. In this article, we analyze existing forensic methods and observe that each method has unique strengths in detecting deepfake traces by focusing on specific facial regions, such as the frame, face, lips, eyes, or nose. Considering these insights, we propose a novel hybrid approach called Face2Parts based on hierarchical feature representation ($HFR$) that takes advantage of coarse-to-fine information to improve deepfake detection. The proposed method involves extracting features from the frame, face, and key facial regions (i.e., lips, eyes, and nose) separately to explore the coarse-to-fine relationships. This approach enables us to capture inter-dependencies among facial regions using a channel-attention mechanism and deep triplet learning. We evaluated the proposed method on benchmark deepfake datasets in both intra-, inter-dataset, and inter-manipulation settings. The proposed method achieves an average AUC of 98.42\% on FF++, 79.80\% on CDF1, 85.34\% on CDF2, 89.41\% on DFD, 84.07\% on DFDC, 95.62\% on DTIM, 80.76\% on PDD, and 100\% on WLDR, respectively. The results demonstrate that our approach generalizes effectively and achieves promising performance to outperform the existing methods.

Key Contributions

Hierarchical feature representation (HFR) approach extracting coarse-to-fine features from frame, face, and facial parts (eyes, lips, nose)
Channel-attention mechanism with deep triplet learning to capture inter-dependencies among facial regions
Cross-dataset generalization achieving 98.42% AUC on FF++, 89.41% on DFD, 84.07% on DFDC across 8 benchmark datasets

🛡️ Threat Analysis

Output Integrity Attack

Detects AI-generated deepfake videos by analyzing hierarchical facial features — this is AI-generated content detection, a core ML09 use case for output integrity and content authenticity.

Details

Domains

visionmultimodal

Model Types

cnngan

Threat Tags

inference_time

Datasets

FF++CDF1CDF2DFDDFDCDTIMPDDWLDR

Applications

2026 0 cit.

Output Integrity Attack

75%

Face2Parts: Exploring Coarse-to-Fine Inter-Regional Facial Dependencies for Generalized Deepfake Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

DeepFake Detection in Dyadic Video Calls using Point of Gaze Tracking

Fourier-Based GAN Fingerprint Detection using ResNet50

On the Holistic Approach for Detecting Human Image Forgery

PromptForge-350k: A Large-Scale Dataset and Contrastive Framework for Prompt-Based AI Image Forgery Localization

Generalizable Detection of AI Generated Images with Large Models and Fuzzy Decision Tree

AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Defending Deepfake via Texture Feature Perturbation

Color Matters: Demosaicing-Guided Color Correlation Training for Generalizable AI-Generated Image Detection