Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity
Shuai Dong 1, Jie Zhang 2,3, Guoying Zhao 4, Shiguang Shan 2,3, Xilin Chen 2,3
Published on arXiv
2512.14320
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
SIFM achieves state-of-the-art immunization success by disrupting both semantic alignment with edit prompts and perceptual quality of unauthorized diffusion-edited outputs, as measured by the newly proposed ISR metric.
SIFM (Synergistic Intermediate Feature Manipulation)
Novel technique introduced
Text-guided image editing via diffusion models, while powerful, raises significant concerns about misuse, motivating efforts to immunize images against unauthorized edits using imperceptible perturbations. Prevailing metrics for evaluating immunization success typically rely on measuring the visual dissimilarity between the output generated from a protected image and a reference output generated from the unprotected original. This approach fundamentally overlooks the core requirement of image immunization, which is to disrupt semantic alignment with attacker intent, regardless of deviation from any specific output. We argue that immunization success should instead be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations, both of which thwart malicious intent. To operationalize this principle, we propose Synergistic Intermediate Feature Manipulation (SIFM), a method that strategically perturbs intermediate diffusion features through dual synergistic objectives: (1) maximizing feature divergence from the original edit trajectory to disrupt semantic alignment with the expected edit, and (2) minimizing feature norms to induce perceptual degradations. Furthermore, we introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time. ISR quantifies the proportion of edits where immunization induces either semantic failure relative to the prompt or significant perceptual degradations, assessed via Multimodal Large Language Models (MLLMs). Extensive experiments show our SIFM achieves the state-of-the-art performance for safeguarding visual content against malicious diffusion-based manipulation.
Key Contributions
- Proposes SIFM (Synergistic Intermediate Feature Manipulation), a dual-objective method that perturbs intermediate diffusion features to simultaneously disrupt semantic alignment with edit prompts and induce perceptual degradation
- Introduces ISR (Immunization Success Rate), a novel metric that redefines immunization success as the proportion of edits where semantic failure or perceptual degradation occurs — assessed via MLLMs — rather than simple visual dissimilarity from a reference output
- Demonstrates state-of-the-art immunization performance against malicious diffusion-based image manipulation, outperforming prior methods like PhotoGE, PhotoGD, and SA
🛡️ Threat Analysis
Image immunization is fundamentally an output integrity concern — protecting visual content from being maliciously manipulated by AI diffusion models. SIFM adds imperceptible adversarial perturbations to images so that unauthorized AI edits fail semantically or degrade perceptually, preserving content authenticity and integrity. The paper also introduces ISR, a new metric for measuring content protection efficacy, consistent with ML09's focus on protecting AI-generated/manipulated content authenticity.