Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking
Lingfeng Yao 1, Chenpei Huang 1, Shengyao Wang 2, Junpei Xue 2, Hanqing Guo 3, Jiang Liu 2, Phone Lin 4, Tomoaki Ohtsuki 5, Miao Pan 1
Published on arXiv
2509.05835
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Proposed overwriting attacks achieve nearly 100% attack success rate against AudioSeal, Timbre, and WavMark across white-box, gray-box, and black-box settings.
Overwriting Attack
Novel technique introduced
As generative audio models are rapidly evolving, AI-generated audios increasingly raise concerns about copyright infringement and misinformation spread. Audio watermarking, as a proactive defense, can embed secret messages into audio for copyright protection and source verification. However, current neural audio watermarking methods focus primarily on the imperceptibility and robustness of watermarking, while ignoring its vulnerability to security attacks. In this paper, we develop a simple yet powerful attack: the overwriting attack that overwrites the legitimate audio watermark with a forged one and makes the original legitimate watermark undetectable. Based on the audio watermarking information that the adversary has, we propose three categories of overwriting attacks, i.e., white-box, gray-box, and black-box attacks. We also thoroughly evaluate the proposed attacks on state-of-the-art neural audio watermarking methods. Experimental results demonstrate that the proposed overwriting attacks can effectively compromise existing watermarking schemes across various settings and achieve a nearly 100% attack success rate. The practicality and effectiveness of the proposed overwriting attacks expose security flaws in existing neural audio watermarking systems, underscoring the need to enhance security in future audio watermarking designs.
Key Contributions
- First systematic study of overwriting attacks against neural audio watermarking, covering white-box, gray-box, and black-box threat models
- Three concrete attack procedures that embed a forged watermark to replace the legitimate one, effectively hijacking audio ownership
- Empirical demonstration of nearly 100% attack success rate against three state-of-the-art watermarking methods (AudioSeal, Timbre, WavMark), exposing a fundamental security gap
🛡️ Threat Analysis
The paper attacks content watermarks embedded in AI-generated audio outputs (AudioSeal, Timbre, WavMark) — these are output provenance/integrity watermarks, not model ownership watermarks. Overwriting attacks defeat the watermark's ability to verify copyright and source, fitting squarely within ML09 (Output Integrity Attack / content watermark attacks).