defense 2025

Deepfake Detection that Generalizes Across Benchmarks

Andrii Yermakov 1, Jan Cech 1, Jiri Matas 1, Mario Fritz 2

0 citations

α

Published on arXiv

2508.06248

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art average cross-dataset AUROC across 14 benchmarks while modifying only 0.03% of model parameters, outperforming architecturally complex recent approaches

GenD

Novel technique introduced


The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment. Although many approaches adapt foundation models by introducing significant architectural complexity, this work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders. The proposed method, GenD, fine-tunes only the Layer Normalization parameters (0.03% of the total) and enhances generalization by enforcing a hyperspherical feature manifold using L2 normalization and metric learning on it. We conducted an extensive evaluation on 14 benchmark datasets spanning from 2019 to 2025. The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC. Our analysis yields two primary findings for the field: 1) training on paired real-fake data from the same source video is essential for mitigating shortcut learning and improving generalization, and 2) detection difficulty on academic datasets has not strictly increased over time, with models trained on older, diverse datasets showing strong generalization capabilities. This work delivers a computationally efficient and reproducible method, proving that state-of-the-art generalization is attainable by making targeted, minimal changes to a pre-trained foundational image encoder model. The code is at: https://github.com/yermandy/GenD


Key Contributions

  • Parameter-efficient deepfake detection by fine-tuning only Layer Normalization parameters (0.03% of weights) of a foundational vision encoder
  • Hyperspherical feature manifold via L2 normalization combined with metric learning to enforce generalizable representations
  • Large-scale evaluation across 14 benchmarks (2019–2025) revealing that paired real-fake training from the same source video is critical for mitigating shortcut learning

🛡️ Threat Analysis

Output Integrity Attack

Core contribution is detecting AI-generated/manipulated face content (deepfakes) — a novel detection architecture for output integrity/content authenticity. Directly addresses the ML09 threat of verifying whether media outputs are genuine or AI-fabricated.


Details

Domains
vision
Model Types
transformer
Threat Tags
inference_time
Datasets
FaceForensics++Celeb-DFDFDCDFDWildDeepfakeFaceShifterUADFVDeepFakeMNIST+
Applications
deepfake detectionface manipulation detection