WildSpoof Challenge Evaluation Plan

The WildSpoof Challenge aims to advance the use of in-the-wild data in two intertwined speech processing tasks. It consists of two parallel tracks: (1) Text-to-Speech (TTS) synthesis for generating spoofed speech, and (2) Spoofing-robust Automatic Speaker Verification (SASV) for detecting spoofed speech. While the organizers coordinate both tracks and define the data protocols, participants treat them as separate and independent tasks. The primary objectives of the challenge are: (i) to promote the use of in-the-wild data for both TTS and SASV, moving beyond conventional clean and controlled datasets and considering real-world scenarios; and (ii) to encourage interdisciplinary collaboration between the spoofing generation (TTS) and spoofing detection (SASV) communities, thereby fostering the development of more integrated, robust, and realistic systems.

Key Contributions

Defines parallel TTS (spoofed speech generation) and SASV (spoofed speech detection) tracks with shared in-the-wild data protocols
Provides evaluation protocols (TITW-KSKT, TITW-KSUT) and standardized metrics (MCD, UTMOS, DNSMOS, WER, SPK-sim) for TTS; and target/non-target/spoof trial structure for SASV
Encourages interdisciplinary collaboration between the spoofing generation and anti-spoofing detection research communities

🛡️ Threat Analysis

Output Integrity Attack

The SASV track is explicitly about detecting AI-synthesized/spoofed speech (audio deepfake detection), and the TTS track generates the spoofed audio — together this challenge benchmarks both the creation and detection of AI-generated audio content, which is squarely ML09 output integrity and content authentication.