WildSpoof Challenge Evaluation Plan
Yihan Wu , Jee-weon Jung , Hye-jin Shim , Xin Cheng , Xin Wang
Published on arXiv
2508.16858
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Establishes an evaluation framework for in-the-wild audio spoofing and detection, with no ranking for TTS but scored SASV evaluation using bonafide target, bonafide non-target, and spoof trial types.
WildSpoof Challenge
Novel technique introduced
The WildSpoof Challenge aims to advance the use of in-the-wild data in two intertwined speech processing tasks. It consists of two parallel tracks: (1) Text-to-Speech (TTS) synthesis for generating spoofed speech, and (2) Spoofing-robust Automatic Speaker Verification (SASV) for detecting spoofed speech. While the organizers coordinate both tracks and define the data protocols, participants treat them as separate and independent tasks. The primary objectives of the challenge are: (i) to promote the use of in-the-wild data for both TTS and SASV, moving beyond conventional clean and controlled datasets and considering real-world scenarios; and (ii) to encourage interdisciplinary collaboration between the spoofing generation (TTS) and spoofing detection (SASV) communities, thereby fostering the development of more integrated, robust, and realistic systems.
Key Contributions
- Defines parallel TTS (spoofed speech generation) and SASV (spoofed speech detection) tracks with shared in-the-wild data protocols
- Provides evaluation protocols (TITW-KSKT, TITW-KSUT) and standardized metrics (MCD, UTMOS, DNSMOS, WER, SPK-sim) for TTS; and target/non-target/spoof trial structure for SASV
- Encourages interdisciplinary collaboration between the spoofing generation and anti-spoofing detection research communities
🛡️ Threat Analysis
The SASV track is explicitly about detecting AI-synthesized/spoofed speech (audio deepfake detection), and the TTS track generates the spoofed audio — together this challenge benchmarks both the creation and detection of AI-generated audio content, which is squarely ML09 output integrity and content authentication.