benchmark 2025

SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis

Vojtěch Staněk , Karel Srna , Anton Firc , Kamil Malinka

0 citations

α

Published on arXiv

2508.07944

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Speaker demographics (sex, language, age) significantly influence deepfake detector performance, with existing state-of-the-art detectors exhibiting measurable disparities across all evaluated speaker characteristic dimensions

SCDF

Novel technique introduced


Despite growing attention to deepfake speech detection, the aspects of bias and fairness remain underexplored in the speech domain. To address this gap, we introduce the Speaker Characteristics Deepfake (SCDF) dataset: a novel, richly annotated resource enabling systematic evaluation of demographic biases in deepfake speech detection. SCDF contains over 237,000 utterances in a balanced representation of both male and female speakers spanning five languages and a wide age range. We evaluate several state-of-the-art detectors and show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type. These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems aligned with ethical and regulatory standards.


Key Contributions

  • SCDF dataset: 237K+ utterances with rich demographic metadata (sex, age, language) spanning 5 languages and 4 TTS/VC synthesizers, balanced across 25 male and 25 female speakers
  • Systematic evaluation of state-of-the-art deepfake speech detectors revealing performance disparities across sex, language, age group, and synthesizer type
  • Empirical evidence for the need for bias-aware training data and development practices in deepfake detection, with alignment to EU AI Act requirements

🛡️ Threat Analysis

Output Integrity Attack

The paper directly targets deepfake speech detection — verifying whether AI-generated audio can be distinguished from real speech — and provides a benchmark to expose systematic failures (demographic bias) in these output integrity systems.


Details

Domains
audio
Model Types
transformergenerative
Datasets
SCDFXTTSv2F5-TTSOpen Voice v2DDDM-VC
Applications
deepfake speech detectionvoice anti-spoofing