Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models
Kang Wei 1, Xin Yuan 2, Fushuo Huo 3, Chuan Ma 4, Long Yuan 5, Songze Li 1, Ming Ding 2, Dacheng Tao 3
Published on arXiv
2509.22723
Input Manipulation Attack
OWASP ML Top 10 — ML01
Output Integrity Attack
OWASP ML Top 10 — ML09
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Provides a comprehensive security framework for diffusion models covering six threat dimensions (privacy, robustness, safety, fairness, copyright, truthfulness) with systematically organized countermeasures and open research challenges.
Diffusion models (DMs) have been investigated in various domains due to their ability to generate high-quality data, thereby attracting significant attention. However, similar to traditional deep learning systems, there also exist potential threats to DMs. To provide advanced and comprehensive insights into safety, ethics, and trust in DMs, this survey comprehensively elucidates its framework, threats, and countermeasures. Each threat and its countermeasures are systematically examined and categorized to facilitate thorough analysis. Furthermore, we introduce specific examples of how DMs are used, what dangers they might bring, and ways to protect against these dangers. Finally, we discuss key lessons learned, highlight open challenges related to DM security, and outline prospective research directions in this critical field. This work aims to accelerate progress not only in the technical capabilities of generative artificial intelligence but also in the maturity and wisdom of its application.
Key Contributions
- Systematic taxonomy of threats to diffusion models across privacy, robustness, safety, fairness, copyright, and truthfulness dimensions
- Comprehensive review of countermeasures against each identified threat category with concrete examples
- Discussion of open challenges and future research directions in diffusion model security
🛡️ Threat Analysis
Survey covers adversarial robustness of diffusion models — attacks manipulating inference-time inputs and defenses against evasion attacks on DMs.
Copyright, truthfulness, and safety sections address AI-generated content detection, deepfake threats, content watermarking, and output provenance — core output integrity concerns for diffusion models.
Survey explicitly covers backdoor/trojan threats to diffusion models and corresponding countermeasures such as backdoor detection and neural cleansing.