tool 2025

Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System

Hashim Ali , Surya Subramani , Lekha Bollinani , Nithin Sai Adupa , Sali El-Loh , Hafiz Malik

0 citations

α

Published on arXiv

2508.20983

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieved 2nd place in both SAFE Challenge Task 1 (unmodified audio) and Task 3 (laundered audio) using a multilingual 256,600-sample training corpus spanning 9 languages and 70+ TTS systems.

AASIST + WavLM Large + RawBoost

Novel technique introduced


The SAFE Challenge evaluates synthetic speech detection across three tasks: unmodified audio, processed audio with compression artifacts, and laundered audio designed to evade detection. We systematically explore self-supervised learning (SSL) front-ends, training data compositions, and audio length configurations for robust deepfake detection. Our AASIST-based approach incorporates WavLM large frontend with RawBoost augmentation, trained on a multilingual dataset of 256,600 samples spanning 9 languages and over 70 TTS systems from CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, and MAILABS. Through extensive experimentation with different SSL front-ends, three training data versions, and two audio lengths, we achieved second place in both Task 1 (unmodified audio detection) and Task 3 (laundered audio detection), demonstrating strong generalization and robustness.


Key Contributions

  • Systematic empirical evaluation of multilingual dataset integration strategies (CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, MAILABS) for training robust audio deepfake detectors
  • Comparison of SSL front-ends (WavLM Large and others), audio length configurations, and training data compositions across three SAFE Challenge tasks
  • Source-level vulnerability analysis revealing failure patterns for specific TTS systems and laundering techniques

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses detection of AI-generated synthetic audio (audio deepfakes), including laundered audio designed to evade detection — core output integrity and content authenticity problem under ML09.


Details

Domains
audio
Model Types
transformer
Threat Tags
inference_time
Datasets
CodecFakeMLAAD v5SpoofCelebMAILABSIn-The-Wild (ITW)SAFE Challenge
Applications
synthetic speech detectionaudio deepfake detectionaudio anti-spoofing