defense arXiv Jan 30, 2026 · 9w ago
Weizhi Liu, Yue Li, Zhaoxia Yin · East China Normal University · Huaqiao University
Injects adapter parameters into speech vocoders to embed robust, high-fidelity watermarks in AI-generated audio for provenance tracking
Output Integrity Attack audiogenerative
Generated speech achieves human-level naturalness but escalates security risks of misuse. However, existing watermarking methods fail to reconcile fidelity with robustness, as they rely either on simple superposition in the noise space or on intrusive alterations to model weights. To bridge this gap, we propose VocBulwark, an additional-parameter injection framework that freezes generative model parameters to preserve perceptual quality. Specifically, we design a Temporal Adapter to deeply entangle watermarks with acoustic attributes, synergizing with a Coarse-to-Fine Gated Extractor to resist advanced attacks. Furthermore, we develop an Accuracy-Guided Optimization Curriculum that dynamically orchestrates gradient flow to resolve the optimization conflict between fidelity and robustness. Comprehensive experiments demonstrate that VocBulwark achieves high-capacity and high-fidelity watermarking, offering robust defense against complex practical scenarios, with resilience to Codec regenerations and variable-length manipulations.
gan diffusion East China Normal University · Huaqiao University