defense 2025

Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity

0 citations · 53 references · arXiv

Published on arXiv

2512.14320

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

SIFM achieves state-of-the-art immunization success by disrupting both semantic alignment with edit prompts and perceptual quality of unauthorized diffusion-edited outputs, as measured by the newly proposed ISR metric.

SIFM (Synergistic Intermediate Feature Manipulation)

Novel technique introduced

Text-guided image editing via diffusion models, while powerful, raises significant concerns about misuse, motivating efforts to immunize images against unauthorized edits using imperceptible perturbations. Prevailing metrics for evaluating immunization success typically rely on measuring the visual dissimilarity between the output generated from a protected image and a reference output generated from the unprotected original. This approach fundamentally overlooks the core requirement of image immunization, which is to disrupt semantic alignment with attacker intent, regardless of deviation from any specific output. We argue that immunization success should instead be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations, both of which thwart malicious intent. To operationalize this principle, we propose Synergistic Intermediate Feature Manipulation (SIFM), a method that strategically perturbs intermediate diffusion features through dual synergistic objectives: (1) maximizing feature divergence from the original edit trajectory to disrupt semantic alignment with the expected edit, and (2) minimizing feature norms to induce perceptual degradations. Furthermore, we introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time. ISR quantifies the proportion of edits where immunization induces either semantic failure relative to the prompt or significant perceptual degradations, assessed via Multimodal Large Language Models (MLLMs). Extensive experiments show our SIFM achieves the state-of-the-art performance for safeguarding visual content against malicious diffusion-based manipulation.

Key Contributions

Proposes SIFM (Synergistic Intermediate Feature Manipulation), a dual-objective method that perturbs intermediate diffusion features to simultaneously disrupt semantic alignment with edit prompts and induce perceptual degradation
Introduces ISR (Immunization Success Rate), a novel metric that redefines immunization success as the proportion of edits where semantic failure or perceptual degradation occurs — assessed via MLLMs — rather than simple visual dissimilarity from a reference output
Demonstrates state-of-the-art immunization performance against malicious diffusion-based image manipulation, outperforming prior methods like PhotoGE, PhotoGD, and SA

🛡️ Threat Analysis

Output Integrity Attack

Image immunization is fundamentally an output integrity concern — protecting visual content from being maliciously manipulated by AI diffusion models. SIFM adds imperceptible adversarial perturbations to images so that unauthorized AI edits fail semantically or degrade perceptually, preserving content authenticity and integrity. The paper also introduces ISR, a new metric for measuring content protection efficacy, consistent with ML09's focus on protecting AI-generated/manipulated content authenticity.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxinference_timedigital

Applications

image editing protectionimage immunizationcontent integrity

Read PDF arXiv DOI

Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

Dual Attention Guided Defense Against Malicious Edits

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

Beauty and the Beast: Imperceptible Perturbations Against Diffusion-Based Face Swapping via Directional Attribute Editing

Towards Transferable Defense Against Malicious Image Edits

Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation