defense 2025

PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

Hohyun Na , Seunghoo Hong , Simon S. Woo

0 citations

α

Published on arXiv

2508.16217

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art protection against diffusion-based inpainting across multiple metrics on EditBench while significantly reducing GPU memory usage compared to prior image immunization methods.

PromptFlare

Novel technique introduced


The success of diffusion models has enabled effortless, high-quality image modifications that precisely align with users' intentions, thereby raising concerns about their potential misuse by malicious actors. Previous studies have attempted to mitigate such misuse through adversarial attacks. However, these approaches heavily rely on image-level inconsistencies, which pose fundamental limitations in addressing the influence of textual prompts. In this paper, we propose PromptFlare, a novel adversarial protection method designed to protect images from malicious modifications facilitated by diffusion-based inpainting models. Our approach leverages the cross-attention mechanism to exploit the intrinsic properties of prompt embeddings. Specifically, we identify and target shared token of prompts that is invariant and semantically uninformative, injecting adversarial noise to suppress the sampling process. The injected noise acts as a cross-attention decoy, diverting the model's focus away from meaningful prompt-image alignments and thereby neutralizing the effect of prompt. Extensive experiments on the EditBench dataset demonstrate that our method achieves state-of-the-art performance across various metrics while significantly reducing computational overhead and GPU memory usage. These findings highlight PromptFlare as a robust and efficient protection against unauthorized image manipulations. The code is available at https://github.com/NAHOHYUN-SKKU/PromptFlare.


Key Contributions

  • Identifies prompt-invariant shared tokens (e.g., padding/uninformative tokens) in diffusion model cross-attention that can be universally targeted regardless of the user's text prompt
  • Injects adversarial noise that acts as a cross-attention decoy, suppressing meaningful prompt-image alignment in diffusion-based inpainting models
  • Achieves state-of-the-art image protection on EditBench with reduced GPU memory and computational overhead compared to prior adversarial image protection methods

🛡️ Threat Analysis

Output Integrity Attack

PromptFlare is an image protection scheme that adds adversarial perturbations to images to prevent unauthorized AI-generated modifications — directly analogous to anti-deepfake perturbations and style-transfer protections listed under ML09. The primary goal is content integrity: stopping malicious actors from using diffusion inpainting to alter protected images.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
white_boxinference_timedigital
Datasets
EditBench
Applications
image inpaintingimage editing protectionunauthorized image manipulation prevention