BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing

Recent advances in text-to-image models have increased the exposure of powerful image editing techniques as a tool, raising concerns about their potential for malicious use. An emerging line of research to address such threats focuses on implanting "protective" adversarial noise into images before their public release, so future attempts to edit them using text-to-image models can be impeded. However, subsequent works have shown that these adversarial noises are often easily "reversed," e.g., with techniques as simple as JPEG compression, casting doubt on the practicality of the approach. In this paper, we argue that adversarial noise for image protection should not only be imperceptible, as has been a primary focus of prior work, but also irreversible, viz., it should be difficult to detect as noise provided that the original image is hidden. We propose a surprisingly simple method to enhance the robustness of image protection methods against noise reversal techniques. Specifically, it applies an adaptive per-region Gaussian blur on the noise to adjust the overall frequency spectrum. Through extensive experiments, we show that our method consistently improves the per-sample worst-case protection performance of existing methods against a wide range of reversal techniques on diverse image editing scenarios, while also reducing quality degradation due to noise in terms of perceptual metrics. Code is available at https://github.com/jsu-kim/BlurGuard.

Key Contributions

Identifies that adversarial image protections must be not only imperceptible but also irreversible (hard to detect as noise without the original image)
Proposes BlurGuard: adaptive per-region Gaussian blur applied to protective adversarial noise to reshape its frequency spectrum, improving robustness against reversal techniques
Demonstrates consistent improvement in worst-case per-sample protection performance across diverse image editing scenarios and reversal methods, while also reducing perceptual quality degradation

🛡️ Threat Analysis

Output Integrity Attack

The paper directly addresses output integrity: it strengthens content protection schemes (adversarial noise) against reversal/removal attacks (JPEG compression, denoising) that defeat image protections. Per the taxonomy, attacking or defending protective image perturbations (anti-editing, anti-deepfake noise) falls under ML09, not ML01, because the threat is removal of content protection, not crafting adversarial examples that cause misclassification.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

digitalinference_time

Applications

2026 0 cit.

Output Integrity Attack

100%

BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Authenticated Contradictions from Desynchronized Provenance and Watermarking

StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models

SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models

VideoGuard: Protecting Video Content from Unauthorized Editing

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity

Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking