defense 2025

Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine Grained-Localization

Nicholas Klein , Hemlata Tak , James Fullwood , Krishna Regmi , Leonidas Spinoulas , Ganesh Sivaraman , Tianxiang Chen , Elie Khoury

0 citations

α

Published on arXiv

2508.08141

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves best performance in temporal localization and top-4 ranking in classification on the ACM 1M Deepfakes Detection Challenge TestA evaluation split.

Pindrop Audio-Visual Deepfake Countermeasure

Novel technique introduced


The field of visual and audio generation is burgeoning with new state-of-the-art methods. This rapid proliferation of new techniques underscores the need for robust solutions for detecting synthetic content in videos. In particular, when fine-grained alterations via localized manipulations are performed in visual, audio, or both domains, these subtle modifications add challenges to the detection algorithms. This paper presents solutions for the problems of deepfake video classification and localization. The methods were submitted to the ACM 1M Deepfakes Detection Challenge, achieving the best performance in the temporal localization task and a top four ranking in the classification task for the TestA split of the evaluation dataset.


Key Contributions

  • Cross-architecture ensemble approach for robust deepfake video classification across audio and visual domains
  • Fine-grained temporal localization of deepfake manipulations (identifying which segments are synthetic)
  • Audio-visual fusion countermeasure achieving #1 in temporal localization and top-4 in classification on the ACM 1M Deepfakes Detection Challenge TestA split

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses detection of AI-generated synthetic content (deepfakes) across audio and visual modalities, including fine-grained localization of manipulated segments — core output integrity and content authenticity work.


Details

Domains
audiovisionmultimodal
Model Types
transformermultimodal
Threat Tags
inference_timedigital
Datasets
ACM 1M Deepfakes Detection Challenge dataset
Applications
deepfake video detectionaudio deepfake detectiontemporal deepfake localization