Secure and Robust Watermarking for AI-generated Images: A Comprehensive Survey
Jie Cao , Qi Li , Zelin Zhang , Jianbing Ni
Published on arXiv
2510.02384
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Surveys the state of AI-generated image watermarking across five dimensions — formalization, techniques, evaluation, attack vulnerabilities, and future directions — providing a holistic reference for researchers.
The rapid advancement of generative artificial intelligence (Gen-AI) has facilitated the effortless creation of high-quality images, while simultaneously raising critical concerns regarding intellectual property protection, authenticity, and accountability. Watermarking has emerged as a promising solution to these challenges by distinguishing AI-generated images from natural content, ensuring provenance, and fostering trustworthy digital ecosystems. This paper presents a comprehensive survey of the current state of AI-generated image watermarking, addressing five key dimensions: (1) formalization of image watermarking systems; (2) an overview and comparison of diverse watermarking techniques; (3) evaluation methodologies with respect to visual quality, capacity, and detectability; (4) vulnerabilities to malicious attacks; and (5) prevailing challenges and future directions. The survey aims to equip researchers with a holistic understanding of AI-generated image watermarking technologies, thereby promoting their continued development.
Key Contributions
- Comprehensive taxonomy and comparison of AI-generated image watermarking techniques across diverse approaches
- Structured evaluation methodology covering visual quality, capacity, and detectability dimensions
- Analysis of vulnerabilities to malicious attacks on watermarks and identification of open challenges and future research directions
🛡️ Threat Analysis
The survey is entirely focused on watermarking AI-generated image outputs for provenance, authenticity, and distinguishing AI-generated content from natural images — plus vulnerabilities to malicious attacks on those watermarks (removal/forgery). This is squarely output integrity and content provenance, not model IP protection (ML05), as the watermarks reside in generated image content, not model weights.