DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis

Recent advances in deep generative models have made it easier to manipulate face videos, raising significant concerns about their potential misuse for fraud and misinformation. Existing detectors often perform well in in-domain scenarios but fail to generalize across diverse manipulation techniques due to their reliance on forgery-specific artifacts. In this work, we introduce DeepShield, a novel deepfake detection framework that balances local sensitivity and global generalization to improve robustness across unseen forgeries. DeepShield enhances the CLIP-ViT encoder through two key components: Local Patch Guidance (LPG) and Global Forgery Diversification (GFD). LPG applies spatiotemporal artifact modeling and patch-wise supervision to capture fine-grained inconsistencies often overlooked by global models. GFD introduces domain feature augmentation, leveraging domain-bridging and boundary-expanding feature generation to synthesize diverse forgeries, mitigating overfitting and enhancing cross-domain adaptability. Through the integration of novel local and global analysis for deepfake detection, DeepShield outperforms state-of-the-art methods in cross-dataset and cross-manipulation evaluations, achieving superior robustness against unseen deepfake attacks. Code is available at https://github.com/lijichang/DeepShield.

Key Contributions

DeepShield framework enhancing CLIP-ViT for deepfake detection with dual local/global analysis components
Local Patch Guidance (LPG): spatiotemporal artifact modeling with patch-wise supervision to capture fine-grained forgery inconsistencies
Global Forgery Diversification (GFD): domain-bridging and boundary-expanding feature augmentation to improve cross-domain generalization

🛡️ Threat Analysis

Output Integrity Attack

DeepShield is an AI-generated content detection system specifically targeting deepfake videos — deepfake detection is explicitly listed under ML09 (output integrity / content provenance). The paper proposes a novel detection architecture rather than merely applying existing methods to a domain.

Details

Domains

visiongenerative

Model Types

transformervlm

Threat Tags

inference_time

Datasets

FaceForensics++Celeb-DFDFDC

Applications

2025 0 cit.

Output Integrity Attack

83%