defense 2026

Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

Qingyu Liu , Yitao Zhang , Zhongjie Ba , Chao Shuai , Peng Cheng , Tianhang Zheng , Zhibo Wang

0 citations · 33 references · arXiv

α

Published on arXiv

2601.06639

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PAI achieves 98.43% ownership verification accuracy across 12 real-world attack methods, outperforming state-of-the-art inherent watermarking methods by 37.25% on average.

PAI (key-conditioned semantic deflection)

Novel technique introduced


Protecting the copyright of user-generated AI images is an emerging challenge as AIGC becomes pervasive in creative workflows. Existing watermarking methods (1) remain vulnerable to real-world adversarial threats, often forced to trade off between defenses against spoofing and removal attacks; and (2) cannot support semantic-level tamper localization. We introduce PAI, a training-free inherent watermarking framework for AIGC copyright protection, plug-and-play with diffusion-based AIGC services. PAI simultaneously provides three key functionalities: robust ownership verification, attack detection, and semantic-level tampering localization. Unlike existing inherent watermark methods that only embed watermarks at noise initialization of diffusion models, we design a novel key-conditioned deflection mechanism that subtly steers the denoising trajectory according to the user key. Such trajectory-level coupling further strengthens the semantic entanglement of identity and content, thereby further enhancing robustness against real-world threats. Moreover, we also provide a theoretical analysis proving that only the valid key can pass verification. Experiments across 12 attack methods show that PAI achieves 98.43\% verification accuracy, improving over SOTA methods by 37.25\% on average, and retains strong tampering localization performance even against advanced AIGC edits. Our code is available at https://github.com/QingyuLiu/PAI.


Key Contributions

  • Training-free, plug-and-play watermarking framework (PAI) that steers the diffusion denoising trajectory via a key-conditioned deflection mechanism, coupling identity to content at the trajectory level
  • Simultaneous support for robust ownership verification, spoofing/removal attack detection, and semantic-level tamper localization in a single framework
  • Theoretical proof that only the valid user key can pass verification, with empirical results showing 98.43% verification accuracy across 12 attack types — 37.25% above SOTA on average

🛡️ Threat Analysis

Output Integrity Attack

PAI embeds watermarks in diffusion model OUTPUT images (not model weights) to verify content ownership, detect watermark removal/spoofing attacks, and localize semantic-level tampering — this is squarely content provenance and output integrity protection for AI-generated images.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_timedigital
Applications
aigc image copyright protectionimage forensicstamper localization