defense 2025

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Jun Jia ¹, Hongyi Miao ², Yingjie Zhou ¹, Linhan Cao ¹, Yanwei Jiang ¹, Wangqiu Zhou ³, Dandan Zhu ⁴, Hua Yang ¹, Wei Sun ⁴, Xiongkuo Min ¹, Guangtao Zhai ¹

¹ Shanghai Jiao Tong University

² Shandong University

³ Hefei University of Technology

⁴ East China Normal University

0 citations · arXiv

Published on arXiv

2511.19910

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DLADiff significantly outperforms existing fine-tuning defenses and achieves unprecedented protection against zero-shot diffusion generation methods across extensive experiments

DLADiff

Novel technique introduced

With the rapid advancement of diffusion models, a variety of fine-tuning methods have been developed, enabling high-fidelity image generation with high similarity to the target content using only 3 to 5 training images. More recently, zero-shot generation methods have emerged, capable of producing highly realistic outputs from a single reference image without altering model weights. However, technological advancements have also introduced significant risks to facial privacy. Malicious actors can exploit diffusion model customization with just a few or even one image of a person to create synthetic identities nearly identical to the original identity. Although research has begun to focus on defending against diffusion model customization, most existing defense methods target fine-tuning approaches and neglect zero-shot generation defenses. To address this issue, this paper proposes Dual-Layer Anti-Diffusion (DLADiff) to defense both fine-tuning methods and zero-shot methods. DLADiff contains a dual-layer protective mechanism. The first layer provides effective protection against unauthorized fine-tuning by leveraging the proposed Dual-Surrogate Models (DSUR) mechanism and Alternating Dynamic Fine-Tuning (ADFT), which integrates adversarial training with the prior knowledge derived from pre-fine-tuned models. The second layer, though simple in design, demonstrates strong effectiveness in preventing image generation through zero-shot methods. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in defending against fine-tuning of diffusion models and achieves unprecedented performance in protecting against zero-shot generation.

Key Contributions

Dual-Surrogate Models (DSUR) mechanism leveraging pre-fine-tuned model prior knowledge to generate stronger protective perturbations against fine-tuning-based diffusion customization
Alternating Dynamic Fine-Tuning (ADFT) that integrates adversarial training with surrogate model priors for robust protection against DreamBooth-style and LoRA-style fine-tuning attacks
Second-layer defense that extends protection to zero-shot generation methods (single-image, no fine-tuning), addressing a gap neglected by all prior defenses

🛡️ Threat Analysis

Output Integrity Attack

Protects content integrity by preventing unauthorized AI-generated synthetic identities from real facial images; adversarial perturbations function as content protection shields against deepfake generation via diffusion model fine-tuning and zero-shot customization — consistent with ML09's coverage of anti-deepfake perturbations and content provenance protection.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxtraining_timeinference_timedigital

Applications

facial privacy protectiondeepfake preventiondiffusion model customization defense

Read PDF arXiv DOI

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Dual Attention Guided Defense Against Malicious Edits

Anti-Tamper Protection for Unauthorized Individual Image Generation

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion

Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways

Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity