benchmark 2025

On Deepfake Voice Detection -- It's All in the Presentation

Héctor Delgado , Giorgio Ramondetti , Emanuele Dalmasso , Gennady Karvitsky , Daniele Colibro , Haydar Talib

1 citations · 30 references · arXiv

α

Published on arXiv

2509.26471

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Training on presentation-aware datasets improved real-world deepfake detection accuracy by 57%, and dataset quality matters more than SOTA model scale for countermeasure generalization.


While the technologies empowering malicious audio deepfakes have dramatically evolved in recent years due to generative AI advances, the same cannot be said of global research into spoofing (deepfake) countermeasures. This paper highlights how current deepfake datasets and research methodologies led to systems that failed to generalize to real world application. The main reason is due to the difference between raw deepfake audio, and deepfake audio that has been presented through a communication channel, e.g. by phone. We propose a new framework for data creation and research methodology, allowing for the development of spoofing countermeasures that would be more effective in real-world scenarios. By following the guidelines outlined here we improved deepfake detection accuracy by 39% in more robust and realistic lab setups, and by 57% on a real-world benchmark. We also demonstrate how improvement in datasets would have a bigger impact on deepfake detection accuracy than the choice of larger SOTA models would over smaller models; that is, it would be more important for the scientific community to make greater investment on comprehensive data collection programs than to simply train larger models with higher computational demands.


Key Contributions

  • First holistic framework for realistic audio deepfake attack scenarios incorporating phone injection and loudspeaker playback presentation phases into a unified evaluation methodology
  • New presented deepfake datasets (SWB-Synth, MLS-Synth) simulating real-world telephony communication channel distortions
  • Empirical demonstration that dataset realism has greater impact on deepfake detection accuracy than model scale, improving detection by 39% in lab setups and 57% on real-world benchmarks

🛡️ Threat Analysis

Output Integrity Attack

Directly advances audio deepfake (AI-generated content) detection by proposing a new data collection and evaluation methodology that accounts for real-world communication channel presentation, improving countermeasure performance.


Details

Domains
audio
Model Types
transformer
Threat Tags
inference_time
Datasets
ASVspoof 2019 LAASVspoof 2021 LAASVspoof 2021 DFASVspoof 5SWB-SynthMLS-SynthFraud Academy
Applications
audio deepfake detectionvoice spoofing countermeasurestelephone fraud detection