A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

Recently, adversarial attacks for diffusion models as well as their fine-tuning process have been developed rapidly. To prevent the abuse of these attack algorithms from affecting the practical application of diffusion models, it is critical to develop corresponding defensive strategies. In this work, we propose an efficient defensive strategy, named Low-Rank Defense (LoRD), to defend the adversarial attack on Latent Diffusion Models (LDMs). LoRD introduces the merging idea and a balance parameter, combined with the low-rank adaptation (LoRA) modules, to detect and defend the adversarial samples. Based on LoRD, we build up a defense pipeline that applies the learned LoRD modules to help diffusion models defend against attack algorithms. Our method ensures that the LDM fine-tuned on both adversarial and clean samples can still generate high-quality images. To demonstrate the effectiveness of our approach, we conduct extensive experiments on facial and landscape images, and our method shows significantly better defense performance compared to the baseline methods.

Key Contributions

Proposes LoRD (Low-Rank Defense), a LoRA-based module using an MLP-derived balance parameter to detect and neutralize adversarially perturbed images during diffusion model fine-tuning
Introduces a two-stage defense pipeline: Stage-1 learns LoRD modules via adversarial training ideas, Stage-2 merges LoRD weights and fine-tunes the LDM on both adversarial and clean samples
Demonstrates significantly better defense performance than baseline methods on facial and landscape image datasets against ACE/ACE+ attacks

🛡️ Threat Analysis

Output Integrity Attack

Anti-DreamBooth, Photoguard, and ACE/ACE+ are adversarial content protection schemes that embed perturbations in images to prevent fine-tuning. LoRD defeats/circumvents these protection schemes during the fine-tuning process. The taxonomy explicitly maps 'defeating image protections via adversarial perturbations' to ML09, not ML01.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

training_timewhite_boxdigital

Datasets

facial image datasetlandscape image dataset

Applications

2025 1 cit.

Output Integrity Attack

92%

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification

Anti-Tamper Protection for Unauthorized Individual Image Generation

Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion

Targeted Data Protection for Diffusion Model by Matching Training Trajectory

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

StyleProtect: Safeguarding Artistic Identity in Fine-tuned Diffusion Models

SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models

RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting