Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking

As generative audio models are rapidly evolving, AI-generated audios increasingly raise concerns about copyright infringement and misinformation spread. Audio watermarking, as a proactive defense, can embed secret messages into audio for copyright protection and source verification. However, current neural audio watermarking methods focus primarily on the imperceptibility and robustness of watermarking, while ignoring its vulnerability to security attacks. In this paper, we develop a simple yet powerful attack: the overwriting attack that overwrites the legitimate audio watermark with a forged one and makes the original legitimate watermark undetectable. Based on the audio watermarking information that the adversary has, we propose three categories of overwriting attacks, i.e., white-box, gray-box, and black-box attacks. We also thoroughly evaluate the proposed attacks on state-of-the-art neural audio watermarking methods. Experimental results demonstrate that the proposed overwriting attacks can effectively compromise existing watermarking schemes across various settings and achieve a nearly 100% attack success rate. The practicality and effectiveness of the proposed overwriting attacks expose security flaws in existing neural audio watermarking systems, underscoring the need to enhance security in future audio watermarking designs.

Key Contributions

First systematic study of overwriting attacks against neural audio watermarking, covering white-box, gray-box, and black-box threat models
Three concrete attack procedures that embed a forged watermark to replace the legitimate one, effectively hijacking audio ownership
Empirical demonstration of nearly 100% attack success rate against three state-of-the-art watermarking methods (AudioSeal, Timbre, WavMark), exposing a fundamental security gap

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks content watermarks embedded in AI-generated audio outputs (AudioSeal, Timbre, WavMark) — these are output provenance/integrity watermarks, not model ownership watermarks. Overwriting attacks defeat the watermark's ability to verify copyright and source, fitting squarely within ML09 (Output Integrity Attack / content watermark attacks).