Self Voice Conversion as an Attack against Neural Audio Watermarking
Yigitcan Özer , Wanying Ge , Zhe Zhang , Xin Wang , Junichi Yamagishi
Published on arXiv
2601.20432
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Self voice conversion severely degrades the reliability of state-of-the-art audio watermarking methods while preserving perceptual quality and speaker identity, exposing a critical gap in current robustness evaluations.
Self Voice Conversion Attack
Novel technique introduced
Audio watermarking embeds auxiliary information into speech while maintaining speaker identity, linguistic content, and perceptual quality. Although recent advances in neural and digital signal processing-based watermarking methods have improved imperceptibility and embedding capacity, robustness is still primarily assessed against conventional distortions such as compression, additive noise, and resampling. However, the rise of deep learning-based attacks introduces novel and significant threats to watermark security. In this work, we investigate self voice conversion as a universal, content-preserving attack against audio watermarking systems. Self voice conversion remaps a speaker's voice to the same identity while altering acoustic characteristics through a voice conversion model. We demonstrate that this attack severely degrades the reliability of state-of-the-art watermarking approaches and highlight its implications for the security of modern audio watermarking techniques.
Key Contributions
- Introduces self voice conversion (self VC) as a novel, universal, content-preserving attack against audio watermarking systems
- Demonstrates that self VC severely degrades watermark detectability across state-of-the-art neural watermarking approaches (AudioSeal, TimbreWatermarking, WMCodec, WavMark, etc.)
- Exposes a systematic overestimation of watermark robustness in current evaluations, which overlook deep learning-based adversarial transformations
🛡️ Threat Analysis
Self voice conversion is used as a watermark removal attack — it defeats content watermarks embedded in audio outputs to undermine provenance verification and content authentication. This is a direct attack on output integrity/content watermarking schemes, matching ML09's 'watermark removal attacks' criterion.