defense 2025

Targeted Data Protection for Diffusion Model by Matching Training Trajectory

Hojun Lee 1,2, Mijin Koo 2, Yeji Song 2, Nojun Kwak 2,3

0 citations · 32 references · arXiv

α

Published on arXiv

2512.10433

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

First method to achieve semantically meaningful targeted data protection in diffusion models, successfully redirecting model outputs to user-specified target concepts while maintaining high perceptual image quality

TAFAP (Trajectory Alignment via Fine-tuning with Adversarial Perturbations)

Novel technique introduced


Recent advancements in diffusion models have made fine-tuning text-to-image models for personalization increasingly accessible, but have also raised significant concerns regarding unauthorized data usage and privacy infringement. Current protection methods are limited to passively degrading image quality, failing to achieve stable control. While Targeted Data Protection (TDP) offers a promising paradigm for active redirection toward user-specified target concepts, existing TDP attempts suffer from poor controllability due to snapshot-matching approaches that fail to account for complete learning dynamics. We introduce TAFAP (Trajectory Alignment via Fine-tuning with Adversarial Perturbations), the first method to successfully achieve effective TDP by controlling the entire training trajectory. Unlike snapshot-based methods whose protective influence is easily diluted as training progresses, TAFAP employs trajectory-matching inspired by dataset distillation to enforce persistent, verifiable transformations throughout fine-tuning. We validate our method through extensive experiments, demonstrating the first successful targeted transformation in diffusion models with simultaneous control over both identity and visual patterns. TAFAP significantly outperforms existing TDP attempts, achieving robust redirection toward target concepts while maintaining high image quality. This work enables verifiable safeguards and provides a new framework for controlling and tracing alterations in diffusion model outputs.


Key Contributions

  • First trajectory-matching approach for Targeted Data Protection in diffusion models, controlling the full fine-tuning trajectory rather than isolated model snapshots
  • TAFAP achieves simultaneous control over both identity and visual patterns in redirected outputs, outperforming snapshot-based TDP methods (Mist, Anti-DreamBooth)
  • Establishes verifiability and traceability properties for protected image data, enabling accountability when unauthorized use occurs

🛡️ Threat Analysis

Output Integrity Attack

Creates adversarial perturbation-based image protection schemes that control what diffusion models learn from protected training data, enabling verifiable identity concealment and traceable output signatures — this is content integrity protection analogous to anti-deepfake/style-protection perturbations covered under ML09's content provenance and output integrity scope.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timetargeteddigitalwhite_box
Applications
text-to-image personalizationidentity protectiondeepfake preventioncopyright protection