attack 2026

Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

Zheng Gao 1, Xiaoyu Li 1, Zhicheng Bao 1, Xiaoyan Feng 2, Jiaojiao Jiang 1

0 citations · 11 references · arXiv (Cornell University)

α

Published on arXiv

2602.21593

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

CSI consistently outperforms prevailing attack baselines against content-aware semantic watermarking schemes, exposing a fundamental security weakness when confronted with LLM-driven semantic perturbations.

CSI (Coherence-Preserving Semantic Injection)

Novel technique introduced


Generative images have proliferated on Web platforms in social media and online copyright distribution scenarios, and semantic watermarking has increasingly been integrated into diffusion models to support reliable provenance tracking and forgery prevention for web content. Traditional noise-layer-based watermarking, however, remains vulnerable to inversion attacks that can recover embedded signals. To mitigate this, recent content-aware semantic watermarking schemes bind watermark signals to high-level image semantics, constraining local edits that would otherwise disrupt global coherence. Yet, large language models (LLMs) possess structured reasoning capabilities that enable targeted exploration of semantic spaces, allowing locally fine-grained but globally coherent semantic alterations that invalidate such bindings. To expose this overlooked vulnerability, we introduce a Coherence-Preserving Semantic Injection (CSI) attack that leverages LLM-guided semantic manipulation under embedding-space similarity constraints. This alignment enforces visual-semantic consistency while selectively perturbing watermark-relevant semantics, ultimately inducing detector misclassification. Extensive empirical results show that CSI consistently outperforms prevailing attack baselines against content-aware semantic watermarking, revealing a fundamental security weakness of current semantic watermark designs when confronted with LLM-driven semantic perturbations.


Key Contributions

  • Identifies a fundamental vulnerability in content-aware semantic watermarking: LLMs can explore semantic spaces to find locally fine-grained yet globally coherent alterations that invalidate watermark-semantic bindings
  • Proposes CSI (Coherence-Preserving Semantic Injection), an LLM-guided attack using embedding-space similarity constraints to maintain visual-semantic consistency while disrupting watermark detection
  • Empirically demonstrates that CSI consistently outperforms existing attack baselines against content-aware semantic watermarking schemes in diffusion models

🛡️ Threat Analysis

Output Integrity Attack

The paper directly attacks content watermarks embedded in AI-generated image outputs for provenance tracking — the watermarks reside in model outputs (not model weights), making this a watermark removal/evasion attack under Output Integrity. The CSI attack causes detector misclassification, undermining the content authenticity and provenance tracking goals of semantic watermarking schemes.


Details

Domains
visiongenerativenlp
Model Types
diffusionllm
Threat Tags
black_boxinference_timetargeteddigital
Applications
ai-generated image provenance trackingcontent watermarkingdigital rights managementdeepfake/forgery prevention