defense 2025

Dual Attention Guided Defense Against Malicious Edits

Jie Zhang 1,2, Shuai Dong 3, Shiguang Shan 1,2, Xilin Chen 1,2

0 citations · 47 references · arXiv

α

Published on arXiv

2512.14333

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DANP achieves state-of-the-art immunization performance against diffusion model-based malicious image editing by simultaneously disrupting cross-attention and noise prediction mechanisms across multiple timesteps.

DANP (Dual Attention-Guided Noise Perturbation)

Novel technique introduced


Recent progress in text-to-image diffusion models has transformed image editing via text prompts, yet this also introduces significant ethical challenges from potential misuse in creating deceptive or harmful content. While current defenses seek to mitigate this risk by embedding imperceptible perturbations, their effectiveness is limited against malicious tampering. To address this issue, we propose a Dual Attention-Guided Noise Perturbation (DANP) immunization method that adds imperceptible perturbations to disrupt the model's semantic understanding and generation process. DANP functions over multiple timesteps to manipulate both cross-attention maps and the noise prediction process, using a dynamic threshold to generate masks that identify text-relevant and irrelevant regions. It then reduces attention in relevant areas while increasing it in irrelevant ones, thereby misguides the edit towards incorrect regions and preserves the intended targets. Additionally, our method maximizes the discrepancy between the injected noise and the model's predicted noise to further interfere with the generation. By targeting both attention and noise prediction mechanisms, DANP exhibits impressive immunity against malicious edits, and extensive experiments confirm that our method achieves state-of-the-art performance.


Key Contributions

  • DANP immunization method that jointly manipulates cross-attention maps and noise prediction in diffusion models using adaptive dynamic threshold masking
  • Dual-directional attention manipulation that suppresses attention on text-relevant regions while amplifying it on irrelevant regions to misdirect malicious edits
  • Noise discrepancy maximization objective that further disrupts the diffusion denoising process beyond attention manipulation

🛡️ Threat Analysis

Output Integrity Attack

The paper's primary contribution is image immunization — embedding imperceptible adversarial perturbations into images to protect their content integrity against unauthorized AI-based (diffusion model) editing. This directly addresses output/content integrity: preventing AI models from successfully generating tampered or deceptive content from protected originals. The threat model centers on malicious actors using text-guided diffusion editing to manipulate images, and the defense preserves the authenticity of the original content.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
white_boxdigitalinference_time
Applications
image editing protectionimage immunizationcontent protection against ai manipulation