Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection
Chanhui Lee 1, Seunghyun Shin 2, Donggyu Choi 2, Hae-gon Jeon 3, Jeany Son 1
Published on arXiv
2602.14679
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
First universal immunization approach significantly outperforms UAP-setting baselines and achieves parity with image-specific methods under restricted perturbation budgets, with strong black-box transferability across diffusion models.
Universal Image Immunization via Semantic Injection
Novel technique introduced
Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected. Simultaneously, it suppresses original content to effectively misdirect the model's attention during editing. As a result, our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP. Moreover, our method operates effectively even in data-free settings without requiring access to training data or domain knowledge, further enhancing its practicality and broad applicability in real-world scenarios. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. In addition, despite the inherent difficulty of universal perturbations, our method also achieves performance on par with image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across different diffusion models.
Key Contributions
- First universal image immunization framework producing a single broadly applicable adversarial perturbation (UAP) that blocks diffusion-based editing across diverse images without per-image optimization
- Semantic injection mechanism that overwrites original image content by embedding a semantic target while suppressing the original semantics to misdirect the diffusion model's attention during editing
- Data-free operation mode requiring no access to training data, combined with strong black-box transferability across different diffusion model architectures
🛡️ Threat Analysis
The paper's primary contribution is protecting image content integrity from AI-driven manipulation (deepfakes, unauthorized diffusion-based editing) — a direct ML09 concern. The ML09 description explicitly includes 'anti-deepfake perturbations' as subject matter in this category. The paper creates image immunization protections analogous to PhotoGuard/Glaze-style defenses, which sit squarely in content integrity/output integrity protection. The universal adversarial perturbation is the implementation mechanism, not the category — the defended threat is unauthorized AI-generated content alteration.