defense 2025

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

Dat Nguyen 1, Marcella Astrid 1, Anis Kacem 1, Enjie Ghorbel 1,2, Djamila Aouada 1

2 citations · 72 references · arXiv (Cornell University)

α

Published on arXiv

2501.01184

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

FakeSTormer outperforms recent state-of-the-art methods on multiple deepfake video detection benchmarks by explicitly modeling subtle spatio-temporal inconsistencies while reducing overfitting.

FakeSTormer

Novel technique introduced


Detecting deepfake videos is highly challenging given the complexity of characterizing spatio-temporal artifacts. Most existing methods rely on binary classifiers trained using real and fake image sequences, therefore hindering their generalization capabilities to unseen generation methods. Moreover, with the constant progress in generative Artificial Intelligence (AI), deepfake artifacts are becoming imperceptible at both the spatial and the temporal levels, making them extremely difficult to capture. To address these issues, we propose a fine-grained deepfake video detection approach called FakeSTormer that enforces the modeling of subtle spatio-temporal inconsistencies while avoiding overfitting. Specifically, we introduce a multi-task learning framework that incorporates two auxiliary branches for explicitly attending artifact-prone spatial and temporal regions. Additionally, we propose a video-level data synthesis strategy that generates pseudo-fake videos with subtle spatio-temporal artifacts, providing high-quality samples and hand-free annotations for our additional branches. Extensive experiments on several challenging benchmarks demonstrate the superiority of our approach compared to recent state-of-the-art methods. The code is available at https://github.com/10Ring/FakeSTormer.


Key Contributions

  • FakeSTormer: a multi-task learning framework with auxiliary branches that explicitly attend to artifact-prone spatial and temporal regions for fine-grained deepfake video detection
  • Video-level data synthesis strategy that generates pseudo-fake videos with subtle spatio-temporal artifacts to provide high-quality training samples with automatic annotations
  • Improved cross-dataset generalization over state-of-the-art deepfake video detection methods on multiple challenging benchmarks

🛡️ Threat Analysis

Output Integrity Attack

FakeSTormer is a deepfake video detector — it authenticates the provenance and integrity of video content by detecting AI-generated manipulation artifacts. Deepfake detection is explicitly an Output Integrity concern in the OWASP ML taxonomy.


Details

Domains
vision
Model Types
transformer
Threat Tags
inference_time
Applications
deepfake video detection