attack 2026

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Qichen Zhao ¹, Shengfang Zhai ², Xinjian Bai ¹, Qingni Shen ¹, Qiqi Lin ¹, Yansong Gao ³, Zhonghai Wu ¹

¹ Peking University

² National University of Singapore

³ The University of Western Australia

0 citations

Published on arXiv

2603.13028

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

EditorClean improves downstream edit quality by 3-6 dB PSNR and reduces FID by 50-70% compared to protected inputs, outperforming prior purification methods by ~2 dB PSNR and 30% lower FID across 2,100 editing tasks and six protection methods

EditorClean

Novel technique introduced

Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners cannot control downstream processing pipelines, and protections optimized for a surrogate model may fail when attackers use mismatched diffusion pipelines. Existing purification methods can weaken protections but often sacrifice image quality and rarely examine architectural mismatch. We introduce a unified post-release purification framework to evaluate protection survivability under model mismatch. We propose two practical purifiers: VAE-Trans, which corrects protected images via latent-space projection, and EditorClean, which performs instruction-guided reconstruction with a Diffusion Transformer to exploit architectural heterogeneity. Both operate without access to protected images or defense internals. Across 2,100 editing tasks and six representative protection methods, EditorClean consistently restores editability. Compared to protected inputs, it improves PSNR by 3-6 dB and reduces FID by 50-70 percent on downstream edits, while outperforming prior purification baselines by about 2 dB PSNR and 30 percent lower FID. Our results reveal a purify-once, edit-freely failure mode: once purification succeeds, the protective signal is largely removed, enabling unrestricted editing. This highlights the need to evaluate protections under model mismatch and design defenses robust to heterogeneous attackers.

Key Contributions

Unified post-release purification framework evaluating protection survivability under model mismatch
VAE-Trans: latent-space projection purifier correcting protected images via fine-tuned VAE encoder
EditorClean: instruction-guided reconstruction using Diffusion Transformer to exploit architectural heterogeneity and remove protections

🛡️ Threat Analysis

Output Integrity Attack

Paper attacks image protection methods that use adversarial perturbations to prevent unauthorized editing/style-transfer. The contribution is removing/defeating these content protection schemes (watermarks, anti-deepfake perturbations, style-transfer protections). Even though the protections use adversarial perturbations, removing them is an ML09 attack on content integrity/protection, not an ML01 adversarial example attack. The paper explicitly targets methods like Glaze, MIST, PhotoGuard that protect images against misuse.

Details

Domains

visiongenerative

Model Types

diffusiontransformer

Threat Tags

black_boxinference_timedigital

Applications

image editingstyle transfercontent protection bypass

Read PDF arXiv

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance

RAVEN: Erasing Invisible Watermarks via Novel View Synthesis

MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation

Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

The Coding Limits of Robust Watermarking for Generative Models

Diffusion-Based Image Editing for Breaking Robust Watermarks

Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction

SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark