defense 2026

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

Chanhui Lee ¹, Seunghyun Shin ², Donggyu Choi ², Hae-gon Jeon ³, Jeany Son ¹

¹ POSTECH AI Graduate School

² GIST AI Graduate School

³ Yonsei University

0 citations · 47 references · arXiv (Cornell University)

Published on arXiv

2602.14679

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

First universal immunization approach significantly outperforms UAP-setting baselines and achieves parity with image-specific methods under restricted perturbation budgets, with strong black-box transferability across diffusion models.

Universal Image Immunization via Semantic Injection

Novel technique introduced

Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected. Simultaneously, it suppresses original content to effectively misdirect the model's attention during editing. As a result, our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP. Moreover, our method operates effectively even in data-free settings without requiring access to training data or domain knowledge, further enhancing its practicality and broad applicability in real-world scenarios. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. In addition, despite the inherent difficulty of universal perturbations, our method also achieves performance on par with image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across different diffusion models.

Key Contributions

First universal image immunization framework producing a single broadly applicable adversarial perturbation (UAP) that blocks diffusion-based editing across diverse images without per-image optimization
Semantic injection mechanism that overwrites original image content by embedding a semantic target while suppressing the original semantics to misdirect the diffusion model's attention during editing
Data-free operation mode requiring no access to training data, combined with strong black-box transferability across different diffusion model architectures

🛡️ Threat Analysis

Output Integrity Attack

The paper's primary contribution is protecting image content integrity from AI-driven manipulation (deepfakes, unauthorized diffusion-based editing) — a direct ML09 concern. The ML09 description explicitly includes 'anti-deepfake perturbations' as subject matter in this category. The paper creates image immunization protections analogous to PhotoGuard/Glaze-style defenses, which sit squarely in content integrity/output integrity protection. The universal adversarial perturbation is the implementation mechanism, not the category — the defended threat is unauthorized AI-generated content alteration.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxblack_boxinference_timedigital

Applications

image editing protectiondeepfake preventioncopyright protection for visual content

Read PDF arXiv DOI

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity

Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models

Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways

PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

Dual Attention Guided Defense Against Malicious Edits

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

MOLM: Mixture of LoRA Markers

Towards Transferable Defense Against Malicious Image Edits