PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

The success of diffusion models has enabled effortless, high-quality image modifications that precisely align with users' intentions, thereby raising concerns about their potential misuse by malicious actors. Previous studies have attempted to mitigate such misuse through adversarial attacks. However, these approaches heavily rely on image-level inconsistencies, which pose fundamental limitations in addressing the influence of textual prompts. In this paper, we propose PromptFlare, a novel adversarial protection method designed to protect images from malicious modifications facilitated by diffusion-based inpainting models. Our approach leverages the cross-attention mechanism to exploit the intrinsic properties of prompt embeddings. Specifically, we identify and target shared token of prompts that is invariant and semantically uninformative, injecting adversarial noise to suppress the sampling process. The injected noise acts as a cross-attention decoy, diverting the model's focus away from meaningful prompt-image alignments and thereby neutralizing the effect of prompt. Extensive experiments on the EditBench dataset demonstrate that our method achieves state-of-the-art performance across various metrics while significantly reducing computational overhead and GPU memory usage. These findings highlight PromptFlare as a robust and efficient protection against unauthorized image manipulations. The code is available at https://github.com/NAHOHYUN-SKKU/PromptFlare.

Key Contributions

Identifies prompt-invariant shared tokens (e.g., padding/uninformative tokens) in diffusion model cross-attention that can be universally targeted regardless of the user's text prompt
Injects adversarial noise that acts as a cross-attention decoy, suppressing meaningful prompt-image alignment in diffusion-based inpainting models
Achieves state-of-the-art image protection on EditBench with reduced GPU memory and computational overhead compared to prior adversarial image protection methods

🛡️ Threat Analysis

Output Integrity Attack

PromptFlare is an image protection scheme that adds adversarial perturbations to images to prevent unauthorized AI-generated modifications — directly analogous to anti-deepfake perturbations and style-transfer protections listed under ML09. The primary goal is content integrity: stopping malicious actors from using diffusion inpainting to alter protected images.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxinference_timedigital

Datasets

EditBench

Applications

2025 0 cit.

Output Integrity Attack

92%

PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways

Dual Attention Guided Defense Against Malicious Edits

Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

Creating Blank Canvas Against AI-enabled Image Forgery

Towards Transferable Defense Against Malicious Image Edits

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models