defense 2025

Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations

Naresh Kumar Devulapally ¹, Shruti Agarwal ², Tejas Gokhale ³, Vishnu Suresh Lokhande ¹

¹ University at Buffalo, The State University of New York

² Adobe Research

³ University of Maryland, Baltimore County

0 citations · 47 references · ACM MM

Published on arXiv

2510.03089

Data Poisoning Attack

OWASP ML Top 10 — ML02

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Latent-space trajectory-shifted perturbations achieve ~8–10% better imperceptibility and ~10% better robustness against inversion attacks compared to pixel-space methods across four benchmark datasets.

Trajectory-Shifted Perturbations (Latent Diffusion Unlearning)

Novel technique introduced

Text-to-image diffusion models have demonstrated remarkable effectiveness in rapid and high-fidelity personalization, even when provided with only a few user images. However, the effectiveness of personalization techniques has lead to concerns regarding data privacy, intellectual property protection, and unauthorized usage. To mitigate such unauthorized usage and model replication, the idea of generating ``unlearnable'' training samples utilizing image poisoning techniques has emerged. Existing methods for this have limited imperceptibility as they operate in the pixel space which results in images with noise and artifacts. In this work, we propose a novel model-based perturbation strategy that operates within the latent space of diffusion models. Our method alternates between denoising and inversion while modifying the starting point of the denoising trajectory: of diffusion models. This trajectory-shifted sampling ensures that the perturbed images maintain high visual fidelity to the original inputs while being resistant to inversion and personalization by downstream generative models. This approach integrates unlearnability into the framework of Latent Diffusion Models (LDMs), enabling a practical and imperceptible defense against unauthorized model adaptation. We validate our approach on four benchmark datasets to demonstrate robustness against state-of-the-art inversion attacks. Results demonstrate that our method achieves significant improvements in imperceptibility ($\sim 8 \% -10\%$ on perceptual metrics including PSNR, SSIM, and FID) and robustness ( $\sim 10\%$ on average across five adversarial settings), highlighting its effectiveness in safeguarding sensitive data.

Key Contributions

Novel latent-space perturbation strategy using trajectory-shifted sampling that alternates between denoising and inversion within a Latent Diffusion Model framework, replacing pixel-space methods
Achieves ~8–10% improvement in imperceptibility on perceptual metrics (PSNR, SSIM, FID) over existing pixel-space unlearnable example methods
Demonstrates ~10% average robustness improvement across five adversarial settings including state-of-the-art inversion/purification attacks

🛡️ Threat Analysis

Data Poisoning Attack

Core mechanism is data poisoning — the paper generates 'unlearnable' training samples by adding latent-space perturbations to user images, making them resistant to personalization training by downstream diffusion models (DreamBooth, LoRA, Textual Inversion). This is a defensive use of data poisoning to degrade attacker-side training performance.

Output Integrity Attack

The overarching goal is protecting personal content/IP from unauthorized AI generation; the paper explicitly evaluates robustness against inversion/purification attacks that attempt to strip the protective perturbations — these attacks try to defeat image protections, which is the adversarial scenario ML09 covers.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

training_timewhite_box

Datasets

CelebA-HQVGGFace2LAIONWikiArt

Applications

image personalization protectionidentity protectionunauthorized diffusion model adaptation prevention

Read PDF arXiv DOI Code

Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models

Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models

Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification

Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion

Anti-Tamper Protection for Unauthorized Individual Image Generation

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models