Targeted Data Protection for Diffusion Model by Matching Training Trajectory

Recent advancements in diffusion models have made fine-tuning text-to-image models for personalization increasingly accessible, but have also raised significant concerns regarding unauthorized data usage and privacy infringement. Current protection methods are limited to passively degrading image quality, failing to achieve stable control. While Targeted Data Protection (TDP) offers a promising paradigm for active redirection toward user-specified target concepts, existing TDP attempts suffer from poor controllability due to snapshot-matching approaches that fail to account for complete learning dynamics. We introduce TAFAP (Trajectory Alignment via Fine-tuning with Adversarial Perturbations), the first method to successfully achieve effective TDP by controlling the entire training trajectory. Unlike snapshot-based methods whose protective influence is easily diluted as training progresses, TAFAP employs trajectory-matching inspired by dataset distillation to enforce persistent, verifiable transformations throughout fine-tuning. We validate our method through extensive experiments, demonstrating the first successful targeted transformation in diffusion models with simultaneous control over both identity and visual patterns. TAFAP significantly outperforms existing TDP attempts, achieving robust redirection toward target concepts while maintaining high image quality. This work enables verifiable safeguards and provides a new framework for controlling and tracing alterations in diffusion model outputs.

Key Contributions

First trajectory-matching approach for Targeted Data Protection in diffusion models, controlling the full fine-tuning trajectory rather than isolated model snapshots
TAFAP achieves simultaneous control over both identity and visual patterns in redirected outputs, outperforming snapshot-based TDP methods (Mist, Anti-DreamBooth)
Establishes verifiability and traceability properties for protected image data, enabling accountability when unauthorized use occurs

🛡️ Threat Analysis

Output Integrity Attack

Creates adversarial perturbation-based image protection schemes that control what diffusion models learn from protected training data, enabling verifiable identity concealment and traceable output signatures — this is content integrity protection analogous to anti-deepfake/style-protection perturbations covered under ML09's content provenance and output integrity scope.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

training_timetargeteddigitalwhite_box

Applications

2025 0 cit.

Output Integrity Attack

85%