benchmark 2025

Fragile by Design: On the Limits of Adversarial Defenses in Personalized Generation

Zhen Chen 1, Yi Zhang 2, Xiangyu Yin 1, Chengxuan Qin 1, Xingyu Zhao 2, Xiaowei Huang 1, Wenjie Ruan 1

0 citations · arXiv

α

Published on arXiv

2511.10382

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

All evaluated anti-personalization defenses (including Anti-DreamBooth and HF-ADB) fail to prevent DreamBooth from reproducing user facial identity after applying simple purification techniques such as Gaussian blur or diffusion-based denoising.

AntiDB_Purify

Novel technique introduced


Personalized AI applications such as DreamBooth enable the generation of customized content from user images, but also raise significant privacy concerns, particularly the risk of facial identity leakage. Recent defense mechanisms like Anti-DreamBooth attempt to mitigate this risk by injecting adversarial perturbations into user photos to prevent successful personalization. However, we identify two critical yet overlooked limitations of these methods. First, the adversarial examples often exhibit perceptible artifacts such as conspicuous patterns or stripes, making them easily detectable as manipulated content. Second, the perturbations are highly fragile, as even a simple, non-learned filter can effectively remove them, thereby restoring the model's ability to memorize and reproduce user identity. To investigate this vulnerability, we propose a novel evaluation framework, AntiDB_Purify, to systematically evaluate existing defenses under realistic purification threats, including both traditional image filters and adversarial purification. Results reveal that none of the current methods maintains their protective effectiveness under such threats. These findings highlight that current defenses offer a false sense of security and underscore the urgent need for more imperceptible and robust protections to safeguard user identity in personalized generation.


Key Contributions

  • Identifies two critical overlooked limitations of anti-personalization defenses: perceptible artifacts and filtering fragility
  • Proposes AntiDB_Purify, an evaluation framework for systematically testing anti-personalization defenses under realistic purification threats including traditional filters and adversarial purification
  • Empirically demonstrates that none of the current state-of-the-art anti-personalization defenses retain protective effectiveness after purification, revealing a false sense of security

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks adversarial perturbation-based content protection schemes (Anti-DreamBooth and similar methods) by demonstrating that simple image purification techniques (Gaussian blur, bilateral filtering, diffusion-based denoising) remove the protective perturbations, restoring a model's ability to memorize and reproduce user identity. Per ML09 scope: 'Attacks that REMOVE or DEFEAT image protections via denoising, purification, or other techniques' are classified as ML09 attacks on content integrity, not ML01 adversarial example attacks.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
digitalblack_boxinference_time
Applications
personalized image generationfacial identity protection