defense 2026

A Novel Unified Approach to Deepfake Detection

Lord Sen , Shyamapada Mukherjee

0 citations · 26 references · arXiv

α

Published on arXiv

2601.03382

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 99.80% and 99.88% AUC on FaceForensics++ and Celeb-DF respectively using Swin Transformer + BERT, outperforming prior SOTA with strong cross-dataset generalization.


The advancements in the field of AI is increasingly giving rise to various threats. One of the most prominent of them is the synthesis and misuse of Deepfakes. To sustain trust in this digital age, detection and tagging of deepfakes is very necessary. In this paper, a novel architecture for Deepfake detection in images and videos is presented. The architecture uses cross attention between spatial and frequency domain features along with a blood detection module to classify an image as real or fake. This paper aims to develop a unified architecture and provide insights into each step. Though this approach we achieve results better than SOTA, specifically 99.80%, 99.88% AUC on FF++ and Celeb-DF upon using Swin Transformer and BERT and 99.55, 99.38 while using EfficientNet-B4 and BERT. The approach also generalizes very well achieving great cross dataset results as well.


Key Contributions

  • Unified architecture fusing spatial and frequency domain features via cross-attention (DFT magnitude/phase + bandpass energy, entropy, PSD statistics) for deepfake detection
  • Blood detection module that analyzes subcutaneous blood signals as a liveness cue, combined with the main classification stream
  • Achieves state-of-the-art AUC of 99.80%/99.88% on FF++ and Celeb-DF with strong cross-dataset generalization

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated content detection architecture specifically targeting deepfake images and videos — deepfake detection falls squarely within ML09 (output integrity and content authenticity). The paper introduces new forensic techniques (cross-attention over DFT frequency bands, blood detection module) rather than merely applying existing methods to a domain.


Details

Domains
vision
Model Types
transformercnn
Threat Tags
inference_timedigital
Datasets
FaceForensics++Celeb-DF
Applications
deepfake detectionimage authenticationvideo forensics