defense 2025

Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification

Wenkui Yang 1,2, Jie Cao 1, Junxian Duan 1, Ran He 1,2

0 citations

α

Published on arXiv

2509.13922

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

AntiPure achieves minimal perceptual discrepancy and maximal post-customization distortion, outperforming existing protective perturbation methods under representative diffusion-based purification settings.

AntiPure

Novel technique introduced


Diffusion models like Stable Diffusion have become prominent in visual synthesis tasks due to their powerful customization capabilities, which also introduce significant security risks, including deepfakes and copyright infringement. In response, a class of methods known as protective perturbation emerged, which mitigates image misuse by injecting imperceptible adversarial noise. However, purification can remove protective perturbations, thereby exposing images again to the risk of malicious forgery. In this work, we formalize the anti-purification task, highlighting challenges that hinder existing approaches, and propose a simple diagnostic protective perturbation named AntiPure. AntiPure exposes vulnerabilities of purification within the "purification-customization" workflow, owing to two guidance mechanisms: 1) Patch-wise Frequency Guidance, which reduces the model's influence over high-frequency components in the purified image, and 2) Erroneous Timestep Guidance, which disrupts the model's denoising strategy across different timesteps. With additional guidance, AntiPure embeds imperceptible perturbations that persist under representative purification settings, achieving effective post-customization distortion. Experiments show that, as a stress test for purification, AntiPure achieves minimal perceptual discrepancy and maximal distortion, outperforming other protective perturbation methods within the purification-customization workflow.


Key Contributions

  • Formalization of the anti-purification task and analysis of why existing protective perturbations fail against diffusion-based purification
  • Patch-wise Frequency Guidance that reduces the diffusion model's influence over high-frequency components in purified images
  • Erroneous Timestep Guidance that disrupts the denoising strategy across timesteps, making AntiPure perturbations persistent through purification-customization workflows

🛡️ Threat Analysis

Output Integrity Attack

The paper is directly about content integrity and anti-deepfake protection: it defends against purification attacks that remove anti-deepfake protective perturbations (explicitly listed in ML09 as 'Attacks that REMOVE or DEFEAT image protections … via denoising, purification'). AntiPure is a defense in this ML09 space, making protective perturbations robust against purification-based defeats to prevent unauthorized diffusion model customization.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timedigitalwhite_box
Applications
image copyright protectiondeepfake preventiondiffusion model customization defense