attack 2025

Immunizing Images from Text to Image Editing via Adversarial Cross-Attention

Matteo Trippodo 1, Federico Becattini 2, Lorenzo Seidenari 1

0 citations

α

Published on arXiv

2509.10359

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Attention Attack significantly degrades text-guided image editing performance on TEDBench++ while remaining imperceptible, without requiring knowledge of the editing method or target edit prompt.

Attention Attack

Novel technique introduced


Recent advances in text-based image editing have enabled fine-grained manipulation of visual content guided by natural language. However, such methods are susceptible to adversarial attacks. In this work, we propose a novel attack that targets the visual component of editing methods. We introduce Attention Attack, which disrupts the cross-attention between a textual prompt and the visual representation of the image by using an automatically generated caption of the source image as a proxy for the edit prompt. This breaks the alignment between the contents of the image and their textual description, without requiring knowledge of the editing method or the editing prompt. Reflecting on the reliability of existing metrics for immunization success, we propose two novel evaluation strategies: Caption Similarity, which quantifies semantic consistency between original and adversarial edits, and semantic Intersection over Union (IoU), which measures spatial layout disruption via segmentation masks. Experiments conducted on the TEDBench++ benchmark demonstrate that our attack significantly degrades editing performance while remaining imperceptible.


Key Contributions

  • Attention Attack: a prompt-agnostic adversarial perturbation method that uses an auto-generated image caption (via LLaVa) as a surrogate edit prompt to disrupt cross-attention between text and visual features in diffusion-based editors
  • Two novel evaluation strategies for immunization success: Caption Similarity (semantic consistency of edits) and semantic IoU (spatial layout disruption via segmentation masks)
  • Benchmark evaluation on TEDBench++ showing significant degradation of text-guided editing while maintaining imperceptibility

🛡️ Threat Analysis

Input Manipulation Attack

Proposes gradient-based adversarial noise crafted to maximally disrupt the cross-attention mechanism in text-to-image editing diffusion models at inference time, causing the editing pipeline to fail. This is a canonical input manipulation attack — imperceptible perturbations are added to inputs to corrupt model behavior, with the twist that the goal is protective immunization rather than misclassification.


Details

Domains
visiongenerative
Model Types
diffusiontransformer
Threat Tags
white_boxinference_timetargeteddigital
Datasets
TEDBench++
Applications
text-based image editingimage protection/immunization