Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models
Tae-Young Lee 1, Juwon Seo 2, Jong Hwan Ko 3, Gyeong-Moon Park 1
Published on arXiv
2511.01307
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
APDM achieves state-of-the-art prevention of unauthorized diffusion model personalization, outperforming existing adversarial image perturbation methods and remaining effective under image transformations or when a few clean images are available.
APDM (Anti-Personalized Diffusion Models) with DPO and L2P
Novel technique introduced
Recent advances in diffusion models have enabled high-quality synthesis of specific subjects, such as identities or objects. This capability, while unlocking new possibilities in content creation, also introduces significant privacy risks, as personalization techniques can be misused by malicious users to generate unauthorized content. Although several studies have attempted to counter this by generating adversarially perturbed samples designed to disrupt personalization, they rely on unrealistic assumptions and become ineffective in the presence of even a few clean images or under simple image transformations. To address these challenges, we shift the protection target from the images to the diffusion model itself to hinder the personalization of specific subjects, through our novel framework called Anti-Personalized Diffusion Models (APDM). We first provide a theoretical analysis demonstrating that a naive approach of existing loss functions to diffusion models is inherently incapable of ensuring convergence for robust anti-personalization. Motivated by this finding, we introduce Direct Protective Optimization (DPO), a novel loss function that effectively disrupts subject personalization in the target model without compromising generative quality. Moreover, we propose a new dual-path optimization strategy, coined Learning to Protect (L2P). By alternating between personalization and protection paths, L2P simulates future personalization trajectories and adaptively reinforces protection at each step. Experimental results demonstrate that our framework outperforms existing methods, achieving state-of-the-art performance in preventing unauthorized personalization. The code is available at https://github.com/KU-VGI/APDM.
Key Contributions
- Theoretical analysis showing existing adversarial image perturbation loss functions are inherently unable to ensure convergence for robust anti-personalization
- Direct Protective Optimization (DPO), a novel loss function that disrupts subject personalization at the model level without degrading generative quality
- Learning to Protect (L2P), a dual-path optimization strategy that simulates future personalization trajectories and adaptively reinforces protection
🛡️ Threat Analysis
The paper defends against unauthorized AI-generated content (deepfakes of specific identities) produced via diffusion model personalization. Rather than perturbing input images (as prior adversarial approaches do), APDM modifies the model itself to resist personalization fine-tuning, preventing harmful synthetic outputs of specific subjects — directly addressing output integrity and unauthorized AI-generated content.