SemBind: Binding Diffusion Watermarks to Semantics Against Black-Box Forgery Attacks
Xin Zhang , Zijin Yang , Kejiang Chen , Linfeng Ma , Weiming Zhang , Nenghai Yu
Published on arXiv
2601.20310
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
SemBind markedly reduces false acceptance under black-box forgery attacks across four mainstream latent-based watermark methods while preserving image quality and offering a controllable robustness-security trade-off.
SemBind
Novel technique introduced
Latent-based watermarks, integrated into the generation process of latent diffusion models (LDMs), simplify detection and attribution of generated images. However, recent black-box forgery attacks, where an attacker needs at least one watermarked image and black-box access to the provider's model, can embed the provider's watermark into images not produced by the provider, posing outsized risk to provenance and trust. We propose SemBind, the first defense framework for latent-based watermarks that resists black-box forgery by binding latent signals to image semantics via a learned semantic masker. Trained with contrastive learning, the masker yields near-invariant codes for the same prompt and near-orthogonal codes across prompts; these codes are reshaped and permuted to modulate the target latent before any standard latent-based watermark. SemBind is generally compatible with existing latent-based watermarking schemes and keeps image quality essentially unchanged, while a simple mask-ratio parameter offers a tunable trade-off between anti-forgery strength and robustness. Across four mainstream latent-based watermark methods, our SemBind-enabled anti-forgery variants markedly reduce false acceptance under black-box forgery while providing a controllable robustness-security balance.
Key Contributions
- First defense framework that resists black-box watermark forgery attacks on latent diffusion models by binding watermark latent signals to image semantics via a learned semantic masker
- Contrastive learning training regime producing near-invariant codes per prompt and near-orthogonal codes across prompts, enabling semantic-conditioned latent modulation
- Plug-and-play compatibility with four mainstream latent-based watermarking schemes, with a tunable mask-ratio parameter balancing anti-forgery strength and robustness
🛡️ Threat Analysis
SemBind protects content watermarks embedded in LDM-generated image outputs — the threat is a forgery attack that causes false acceptance by embedding the provider's watermark into images not generated by the provider, directly attacking output integrity and provenance attribution.