OmniFD: A Unified Model for Versatile Face Forgery Detection
Haotian Liu , Haoyu Chen , Chenhui Pan , You Hu , Guoying Zhao , Xiaobai Li
Published on arXiv
2512.01128
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
OmniFD reduces 63% model parameters and 50% training time vs. task-specific baselines, and video classification accuracy improves by 4.63% when image data are incorporated via multi-task learning
OmniFD
Novel technique introduced
Face forgery detection encompasses multiple critical tasks, including identifying forged images and videos and localizing manipulated regions and temporal segments. Current approaches typically employ task-specific models with independent architectures, leading to computational redundancy and ignoring potential correlations across related tasks. We introduce OmniFD, a unified framework that jointly addresses four core face forgery detection tasks within a single model, i.e., image and video classification, spatial localization, and temporal localization. Our architecture consists of three principal components: (1) a shared Swin Transformer encoder that extracts unified 4D spatiotemporal representations from both images and video inputs, (2) a cross-task interaction module with learnable queries that dynamically captures inter-task dependencies through attention-based reasoning, and (3) lightweight decoding heads that transform refined representations into corresponding predictions for all FFD tasks. Extensive experiments demonstrate OmniFD's advantage over task-specific models. Its unified design leverages multi-task learning to capture generalized representations across tasks, especially enabling fine-grained knowledge transfer that facilitates other tasks. For example, video classification accuracy improves by 4.63% when image data are incorporated. Furthermore, by unifying images, videos and the four tasks within one framework, OmniFD achieves superior performance across diverse benchmarks with high efficiency and scalability, e.g., reducing 63% model parameters and 50% training time. It establishes a practical and generalizable solution for comprehensive face forgery detection in real-world applications. The source code is made available at https://github.com/haotianll/OmniFD.
Key Contributions
- Unified OmniFD framework addressing four face forgery detection tasks (image classification, video classification, spatial localization, temporal localization) within a single model using a shared Swin Transformer encoder
- Cross-task interaction module with learnable queries that dynamically captures inter-task dependencies via attention-based reasoning, enabling knowledge transfer across tasks
- 63% reduction in model parameters and 50% reduction in training time compared to task-specific models, while achieving superior or competitive performance across diverse FFD benchmarks
🛡️ Threat Analysis
Deepfake and face forgery detection is explicitly an output integrity / AI-generated content detection task. OmniFD proposes a novel unified architecture for detecting manipulated faces across four tasks (image classification, video classification, spatial localization, temporal localization), which is a novel forensic detection contribution, not merely applying existing methods to a domain.