Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Neural speech editing advancements have raised concerns about their misuse in spoofing attacks. Traditional partially edited speech corpora primarily focus on cut-and-paste edits, which, while maintaining speaker consistency, often introduce detectable discontinuities. Recent methods, like A\textsuperscript{3}T and Voicebox, improve transitions by leveraging contextual information. To foster spoofing detection research, we introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox. We detailed the process of re-implementing Voicebox training and dataset creation. Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods. Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization across different edit methods. The dataset and related models will be made publicly available.

Key Contributions

Introduces the SINE (Speech INfilling Edit) dataset, the first corpus specifically designed for seamless speech editing detection using Voicebox neural infilling
Provides a detailed re-implementation of Voicebox training and a four-category audio generation pipeline (two edited types, two genuine types)
Evaluates four SOTA spoof detectors on SINE, showing self-supervised-based models generalize well despite human difficulty in detecting seamless edits

🛡️ Threat Analysis

Output Integrity Attack

The paper directly addresses detection of AI-generated/edited audio content (speech deepfakes created with Voicebox infilling). Creating the SINE dataset and evaluating existing detectors against seamless speech edits is a contribution to the AI-generated content detection research area — a canonical ML09 concern around output integrity and content authenticity.

Details

Domains

audio

Model Types

transformerdiffusion

Threat Tags

inference_time

Datasets

SINE (proposed)HAD dataset

Applications

2025 0 cit.

Output Integrity Attack

80%