α

Published on arXiv

2509.05835

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Proposed overwriting attacks achieve nearly 100% attack success rate against AudioSeal, Timbre, and WavMark across white-box, gray-box, and black-box settings.

Overwriting Attack

Novel technique introduced


As generative audio models are rapidly evolving, AI-generated audios increasingly raise concerns about copyright infringement and misinformation spread. Audio watermarking, as a proactive defense, can embed secret messages into audio for copyright protection and source verification. However, current neural audio watermarking methods focus primarily on the imperceptibility and robustness of watermarking, while ignoring its vulnerability to security attacks. In this paper, we develop a simple yet powerful attack: the overwriting attack that overwrites the legitimate audio watermark with a forged one and makes the original legitimate watermark undetectable. Based on the audio watermarking information that the adversary has, we propose three categories of overwriting attacks, i.e., white-box, gray-box, and black-box attacks. We also thoroughly evaluate the proposed attacks on state-of-the-art neural audio watermarking methods. Experimental results demonstrate that the proposed overwriting attacks can effectively compromise existing watermarking schemes across various settings and achieve a nearly 100% attack success rate. The practicality and effectiveness of the proposed overwriting attacks expose security flaws in existing neural audio watermarking systems, underscoring the need to enhance security in future audio watermarking designs.


Key Contributions

  • First systematic study of overwriting attacks against neural audio watermarking, covering white-box, gray-box, and black-box threat models
  • Three concrete attack procedures that embed a forged watermark to replace the legitimate one, effectively hijacking audio ownership
  • Empirical demonstration of nearly 100% attack success rate against three state-of-the-art watermarking methods (AudioSeal, Timbre, WavMark), exposing a fundamental security gap

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks content watermarks embedded in AI-generated audio outputs (AudioSeal, Timbre, WavMark) — these are output provenance/integrity watermarks, not model ownership watermarks. Overwriting attacks defeat the watermark's ability to verify copyright and source, fitting squarely within ML09 (Output Integrity Attack / content watermark attacks).


Details

Domains
audiogenerative
Model Types
transformer
Threat Tags
white_boxgrey_boxblack_boxinference_timetargeted
Applications
audio watermarkingai-generated audio copyright protectionsource verification