tool 2026

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators

Lam Pham 1, Khoi Vu 2, Dat Tran 2, David Fischinger 1, Simon Freitter 1, Marcel Hasenbalg 1, Davide Antonutti 1, Alexander Schindler 1, Martin Boyer 1, Ian McLoughlin 3

0 citations

α

Published on arXiv

2603.27557

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Cross-dataset evaluation proves that balancing bonafide resources and AI generators is key to achieving general deepfake speech detection models


In this paper, we analyze two main factors of Bonafide Resource (BR) or AI-based Generator (AG) which affect the performance and the generality of a Deepfake Speech Detection (DSD) model. To this end, we first propose a deep-learning based model, referred to as the baseline. Then, we conducted experiments on the baseline by which we indicate how Bonafide Resource (BR) and AI-based Generator (AG) factors affect the threshold score used to detect fake or bonafide input audio in the inference process. Given the experimental results, a dataset, which re-uses public Deepfake Speech Detection (DSD) datasets and shows a balance between Bonafide Resource (BR) or AI-based Generator (AG), is proposed. We then train various deep-learning based models on the proposed dataset and conduct cross-dataset evaluation on different benchmark datasets. The cross-dataset evaluation results prove that the balance of Bonafide Resources (BR) and AI-based Generators (AG) is the key factor to train and achieve a general Deepfake Speech Detection (DSD) model.


Key Contributions

  • Analysis of how bonafide resource (BR) and AI-based generator (AG) diversity affect deepfake speech detection model generalization
  • Proposed balanced dataset combining multiple public DSD datasets for improved cross-dataset performance
  • Cross-dataset evaluation framework using fixed threshold scores to measure model generality

🛡️ Threat Analysis

Output Integrity Attack

Paper proposes a detection system for AI-generated speech (deepfakes) — this is AI-generated content detection for audio, which is output integrity/authenticity verification.


Details

Domains
audio
Model Types
transformer
Threat Tags
inference_time
Datasets
ASVspoof 2015ASVspoof 2019FoRDFDC
Applications
deepfake speech detectionaudio forensics