Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

Recent advances in diffusion models have enabled high-quality synthesis of specific subjects, such as identities or objects. This capability, while unlocking new possibilities in content creation, also introduces significant privacy risks, as personalization techniques can be misused by malicious users to generate unauthorized content. Although several studies have attempted to counter this by generating adversarially perturbed samples designed to disrupt personalization, they rely on unrealistic assumptions and become ineffective in the presence of even a few clean images or under simple image transformations. To address these challenges, we shift the protection target from the images to the diffusion model itself to hinder the personalization of specific subjects, through our novel framework called Anti-Personalized Diffusion Models (APDM). We first provide a theoretical analysis demonstrating that a naive approach of existing loss functions to diffusion models is inherently incapable of ensuring convergence for robust anti-personalization. Motivated by this finding, we introduce Direct Protective Optimization (DPO), a novel loss function that effectively disrupts subject personalization in the target model without compromising generative quality. Moreover, we propose a new dual-path optimization strategy, coined Learning to Protect (L2P). By alternating between personalization and protection paths, L2P simulates future personalization trajectories and adaptively reinforces protection at each step. Experimental results demonstrate that our framework outperforms existing methods, achieving state-of-the-art performance in preventing unauthorized personalization. The code is available at https://github.com/KU-VGI/APDM.

Key Contributions

Theoretical analysis showing existing adversarial image perturbation loss functions are inherently unable to ensure convergence for robust anti-personalization
Direct Protective Optimization (DPO), a novel loss function that disrupts subject personalization at the model level without degrading generative quality
Learning to Protect (L2P), a dual-path optimization strategy that simulates future personalization trajectories and adaptively reinforces protection

🛡️ Threat Analysis

Output Integrity Attack

The paper defends against unauthorized AI-generated content (deepfakes of specific identities) produced via diffusion model personalization. Rather than perturbing input images (as prior adversarial approaches do), APDM modifies the model itself to resist personalization fine-tuning, preventing harmful synthetic outputs of specific subjects — directly addressing output integrity and unauthorized AI-generated content.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

training_timetargetedwhite_box

Applications

2025 0 cit.

Output Integrity Attack

79%