HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal

The availability of high-quality, AI-generated audio raises security challenges such as misinformation campaigns and voice-cloning fraud. A key defense against the misuse of AI-generated audio is by watermarking it, so that it can be easily distinguished from genuine audio. As those seeking to misuse AI-generated audio may thus seek to remove audio watermarks, studying effective watermark removal techniques is critical to being able to objectively evaluate the robustness of audio watermarks against removal. Previous watermark removal schemes either assume impractical knowledge of the watermarks they are designed to remove or are computationally expensive, potentially generating a false sense of confidence in current watermark schemes. We introduce HarmonicAttack, an efficient audio watermark removal method that only requires the basic ability to generate the watermarks from the targeted scheme and nothing else. With this, we are able to train a general watermark removal model that is able to remove the watermarks generated by the targeted scheme from any watermarked audio sample. HarmonicAttack employs a dual-path convolutional autoencoder that operates in both temporal and frequency domains, along with GAN-style training, to separate the watermark from the original audio. When evaluated against state-of-the-art watermark schemes AudioSeal, WavMark, and Silentcipher, HarmonicAttack demonstrates greater watermark removal ability than previous watermark removal methods with near real-time performance. Moreover, while HarmonicAttack requires training, we find that it is able to transfer to out-of-distribution samples with minimal degradation in performance.

Key Contributions

HarmonicAttack: an adaptive audio watermark removal method requiring only the ability to generate watermarks from the targeted scheme — no internal model knowledge needed
Dual-path convolutional autoencoder operating jointly in temporal and frequency domains with GAN-style training to separate watermark signal from original audio
Demonstrates superior watermark removal against AudioSeal, WavMark, and Silentcipher compared to prior methods, with near real-time performance and strong out-of-distribution generalization

🛡️ Threat Analysis

Output Integrity Attack

HarmonicAttack is a watermark removal attack targeting audio content watermarks used to trace AI-generated audio provenance. Per the ML09 definition, attacks that remove or defeat content watermarks (here: AudioSeal, WavMark, Silentcipher) are output integrity attacks. The goal is defeating content authentication, not creating adversarial examples.