attack 2026

Self Voice Conversion as an Attack against Neural Audio Watermarking

Yigitcan Özer , Wanying Ge , Zhe Zhang , Xin Wang , Junichi Yamagishi

1 citations · 45 references · arXiv

α

Published on arXiv

2601.20432

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Self voice conversion severely degrades the reliability of state-of-the-art audio watermarking methods while preserving perceptual quality and speaker identity, exposing a critical gap in current robustness evaluations.

Self Voice Conversion Attack

Novel technique introduced


Audio watermarking embeds auxiliary information into speech while maintaining speaker identity, linguistic content, and perceptual quality. Although recent advances in neural and digital signal processing-based watermarking methods have improved imperceptibility and embedding capacity, robustness is still primarily assessed against conventional distortions such as compression, additive noise, and resampling. However, the rise of deep learning-based attacks introduces novel and significant threats to watermark security. In this work, we investigate self voice conversion as a universal, content-preserving attack against audio watermarking systems. Self voice conversion remaps a speaker's voice to the same identity while altering acoustic characteristics through a voice conversion model. We demonstrate that this attack severely degrades the reliability of state-of-the-art watermarking approaches and highlight its implications for the security of modern audio watermarking techniques.


Key Contributions

  • Introduces self voice conversion (self VC) as a novel, universal, content-preserving attack against audio watermarking systems
  • Demonstrates that self VC severely degrades watermark detectability across state-of-the-art neural watermarking approaches (AudioSeal, TimbreWatermarking, WMCodec, WavMark, etc.)
  • Exposes a systematic overestimation of watermark robustness in current evaluations, which overlook deep learning-based adversarial transformations

🛡️ Threat Analysis

Output Integrity Attack

Self voice conversion is used as a watermark removal attack — it defeats content watermarks embedded in audio outputs to undermine provenance verification and content authentication. This is a direct attack on output integrity/content watermarking schemes, matching ML09's 'watermark removal attacks' criterion.


Details

Domains
audio
Model Types
transformer
Threat Tags
black_boxinference_timedigital
Applications
audio watermarkingspeech content authenticationdeepfake provenance tracking