defense 2025

Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations

Naresh Kumar Devulapally 1, Shruti Agarwal 2, Tejas Gokhale 3, Vishnu Suresh Lokhande 1

0 citations · 47 references · ACM MM

α

Published on arXiv

2510.03089

Data Poisoning Attack

OWASP ML Top 10 — ML02

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Latent-space trajectory-shifted perturbations achieve ~8–10% better imperceptibility and ~10% better robustness against inversion attacks compared to pixel-space methods across four benchmark datasets.

Trajectory-Shifted Perturbations (Latent Diffusion Unlearning)

Novel technique introduced


Text-to-image diffusion models have demonstrated remarkable effectiveness in rapid and high-fidelity personalization, even when provided with only a few user images. However, the effectiveness of personalization techniques has lead to concerns regarding data privacy, intellectual property protection, and unauthorized usage. To mitigate such unauthorized usage and model replication, the idea of generating ``unlearnable'' training samples utilizing image poisoning techniques has emerged. Existing methods for this have limited imperceptibility as they operate in the pixel space which results in images with noise and artifacts. In this work, we propose a novel model-based perturbation strategy that operates within the latent space of diffusion models. Our method alternates between denoising and inversion while modifying the starting point of the denoising trajectory: of diffusion models. This trajectory-shifted sampling ensures that the perturbed images maintain high visual fidelity to the original inputs while being resistant to inversion and personalization by downstream generative models. This approach integrates unlearnability into the framework of Latent Diffusion Models (LDMs), enabling a practical and imperceptible defense against unauthorized model adaptation. We validate our approach on four benchmark datasets to demonstrate robustness against state-of-the-art inversion attacks. Results demonstrate that our method achieves significant improvements in imperceptibility ($\sim 8 \% -10\%$ on perceptual metrics including PSNR, SSIM, and FID) and robustness ( $\sim 10\%$ on average across five adversarial settings), highlighting its effectiveness in safeguarding sensitive data.


Key Contributions

  • Novel latent-space perturbation strategy using trajectory-shifted sampling that alternates between denoising and inversion within a Latent Diffusion Model framework, replacing pixel-space methods
  • Achieves ~8–10% improvement in imperceptibility on perceptual metrics (PSNR, SSIM, FID) over existing pixel-space unlearnable example methods
  • Demonstrates ~10% average robustness improvement across five adversarial settings including state-of-the-art inversion/purification attacks

🛡️ Threat Analysis

Data Poisoning Attack

Core mechanism is data poisoning — the paper generates 'unlearnable' training samples by adding latent-space perturbations to user images, making them resistant to personalization training by downstream diffusion models (DreamBooth, LoRA, Textual Inversion). This is a defensive use of data poisoning to degrade attacker-side training performance.

Output Integrity Attack

The overarching goal is protecting personal content/IP from unauthorized AI generation; the paper explicitly evaluates robustness against inversion/purification attacks that attempt to strip the protective perturbations — these attacks try to defeat image protections, which is the adversarial scenario ML09 covers.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timewhite_box
Datasets
CelebA-HQVGGFace2LAIONWikiArt
Applications
image personalization protectionidentity protectionunauthorized diffusion model adaptation prevention