defense 2025

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Jun Jia 1, Hongyi Miao 2, Yingjie Zhou 1, Linhan Cao 1, Yanwei Jiang 1, Wangqiu Zhou 3, Dandan Zhu 4, Hua Yang 1, Wei Sun 4, Xiongkuo Min 1, Guangtao Zhai 1

0 citations · arXiv

α

Published on arXiv

2511.19910

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DLADiff significantly outperforms existing fine-tuning defenses and achieves unprecedented protection against zero-shot diffusion generation methods across extensive experiments

DLADiff

Novel technique introduced


With the rapid advancement of diffusion models, a variety of fine-tuning methods have been developed, enabling high-fidelity image generation with high similarity to the target content using only 3 to 5 training images. More recently, zero-shot generation methods have emerged, capable of producing highly realistic outputs from a single reference image without altering model weights. However, technological advancements have also introduced significant risks to facial privacy. Malicious actors can exploit diffusion model customization with just a few or even one image of a person to create synthetic identities nearly identical to the original identity. Although research has begun to focus on defending against diffusion model customization, most existing defense methods target fine-tuning approaches and neglect zero-shot generation defenses. To address this issue, this paper proposes Dual-Layer Anti-Diffusion (DLADiff) to defense both fine-tuning methods and zero-shot methods. DLADiff contains a dual-layer protective mechanism. The first layer provides effective protection against unauthorized fine-tuning by leveraging the proposed Dual-Surrogate Models (DSUR) mechanism and Alternating Dynamic Fine-Tuning (ADFT), which integrates adversarial training with the prior knowledge derived from pre-fine-tuned models. The second layer, though simple in design, demonstrates strong effectiveness in preventing image generation through zero-shot methods. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in defending against fine-tuning of diffusion models and achieves unprecedented performance in protecting against zero-shot generation.


Key Contributions

  • Dual-Surrogate Models (DSUR) mechanism leveraging pre-fine-tuned model prior knowledge to generate stronger protective perturbations against fine-tuning-based diffusion customization
  • Alternating Dynamic Fine-Tuning (ADFT) that integrates adversarial training with surrogate model priors for robust protection against DreamBooth-style and LoRA-style fine-tuning attacks
  • Second-layer defense that extends protection to zero-shot generation methods (single-image, no fine-tuning), addressing a gap neglected by all prior defenses

🛡️ Threat Analysis

Output Integrity Attack

Protects content integrity by preventing unauthorized AI-generated synthetic identities from real facial images; adversarial perturbations function as content protection shields against deepfake generation via diffusion model fine-tuning and zero-shot customization — consistent with ML09's coverage of anti-deepfake perturbations and content provenance protection.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
white_boxtraining_timeinference_timedigital
Applications
facial privacy protectiondeepfake preventiondiffusion model customization defense