WaterVIB: Learning Minimal Sufficient Watermark Representations via Variational Information Bottleneck
Haoyuan He , Yu Zheng , Jie Zhou , Jiwen Lu
Published on arXiv
2602.21508
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
WaterVIB achieves superior zero-shot resilience against unknown diffusion-based editing attacks, significantly outperforming existing state-of-the-art robust watermarking methods.
WaterVIB
Novel technique introduced
Robust watermarking is critical for intellectual property protection, whereas existing methods face a severe vulnerability against regeneration-based AIGC attacks. We identify that existing methods fail because they entangle the watermark with high-frequency cover texture, which is susceptible to being rewritten during generative purification. To address this, we propose WaterVIB, a theoretically grounded framework that reformulates the encoder as an information sieve via the Variational Information Bottleneck. Instead of overfitting to fragile cover details, our approach forces the model to learn a Minimal Sufficient Statistic of the message. This effectively filters out redundant cover nuances prone to generative shifts, retaining only the essential signal invariant to regeneration. We theoretically prove that optimizing this bottleneck is a necessary condition for robustness against distribution-shifting attacks. Extensive experiments demonstrate that WaterVIB significantly outperforms state-of-the-art methods, achieving superior zero-shot resilience against unknown diffusion-based editing.
Key Contributions
- WaterVIB framework that reformulates the watermark encoder as an information sieve using the Variational Information Bottleneck, forcing it to learn a Minimal Sufficient Statistic of the message rather than entangling it with fragile high-frequency cover texture
- Theoretical proof that optimizing the VIB bottleneck is a necessary condition for robustness against distribution-shifting (regeneration-based) attacks
- Zero-shot resilience against unknown diffusion-based editing, outperforming state-of-the-art watermarking methods across six AIGC editing benchmarks
🛡️ Threat Analysis
WaterVIB is a content watermarking defense: it embeds watermarks in image outputs for copyright protection and provenance, and defends specifically against regeneration-based AIGC attacks (diffusion models) that erase those watermarks — classic output integrity / watermark robustness against removal attacks.