A Novel Unified Approach to Deepfake Detection

The advancements in the field of AI is increasingly giving rise to various threats. One of the most prominent of them is the synthesis and misuse of Deepfakes. To sustain trust in this digital age, detection and tagging of deepfakes is very necessary. In this paper, a novel architecture for Deepfake detection in images and videos is presented. The architecture uses cross attention between spatial and frequency domain features along with a blood detection module to classify an image as real or fake. This paper aims to develop a unified architecture and provide insights into each step. Though this approach we achieve results better than SOTA, specifically 99.80%, 99.88% AUC on FF++ and Celeb-DF upon using Swin Transformer and BERT and 99.55, 99.38 while using EfficientNet-B4 and BERT. The approach also generalizes very well achieving great cross dataset results as well.

Key Contributions

Unified architecture fusing spatial and frequency domain features via cross-attention (DFT magnitude/phase + bandpass energy, entropy, PSD statistics) for deepfake detection
Blood detection module that analyzes subcutaneous blood signals as a liveness cue, combined with the main classification stream
Achieves state-of-the-art AUC of 99.80%/99.88% on FF++ and Celeb-DF with strong cross-dataset generalization

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated content detection architecture specifically targeting deepfake images and videos — deepfake detection falls squarely within ML09 (output integrity and content authenticity). The paper introduces new forensic techniques (cross-attention over DFT frequency bands, blood detection module) rather than merely applying existing methods to a domain.

Details

Domains

vision

Model Types

transformercnn

Threat Tags

inference_timedigital

Datasets

FaceForensics++Celeb-DF

Applications

2025 0 cit.

Output Integrity Attack

100%