Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection
Jianfeng Liao 1, Yichen Wei 1, Raymond Chan Ching Bon 2, Shulan Wang 1, Kam-Pui Chow 3, Kwok-Yan Lam 4
Published on arXiv
2603.01450
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves video-level AUC/EER of 0.836/0.251 on DFDC, a 4.8% video AUC improvement over prior SOTA methods.
Deepfake Forensics Adapter (DFA)
Novel technique introduced
The rapid advancement of deepfake generation techniques poses significant threats to public safety and causes societal harm through the creation of highly realistic synthetic facial media. While existing detection methods demonstrate limitations in generalizing to emerging forgery patterns, this paper presents Deepfake Forensics Adapter (DFA), a novel dual-stream framework that synergizes vision-language foundation models with targeted forensics analysis. Our approach integrates a pre-trained CLIP model with three core components to achieve specialized deepfake detection by leveraging the powerful general capabilities of CLIP without changing CLIP parameters: 1) A Global Feature Adapter is used to identify global inconsistencies in image content that may indicate forgery, 2) A Local Anomaly Stream enhances the model's ability to perceive local facial forgery cues by explicitly leveraging facial structure priors, and 3) An Interactive Fusion Classifier promotes deep interaction and fusion between global and local features using a transformer encoder. Extensive evaluations of frame-level and video-level benchmarks demonstrate the superior generalization capabilities of DFA, particularly achieving state-of-the-art performance in the challenging DFDC dataset with frame-level AUC/EER of 0.816/0.256 and video-level AUC/EER of 0.836/0.251, representing a 4.8% video AUC improvement over previous methods. Our framework not only demonstrates state-of-the-art performance, but also points out a feasible and effective direction for developing a robust deepfake detection system with enhanced generalization capabilities against the evolving deepfake threats. Our code is available at https://github.com/Liao330/DFA.git
Key Contributions
- Dual-stream DFA framework that freezes CLIP parameters and injects forensics specialization via a Global Feature Adapter, generating attention biases toward discriminative forgery regions
- Local Anomaly Stream that exploits facial structure priors to extract fine-grained local cues (eyes, mouth) missed by global models
- Interactive Fusion Classifier using a transformer encoder to deeply integrate global and local features, achieving SOTA on DFDC with 4.8% video-level AUC improvement
🛡️ Threat Analysis
The paper's primary contribution is a novel AI-generated content detection architecture (deepfake facial media detection) that addresses output integrity by distinguishing authentic from synthetically generated faces — a canonical ML09 use case.