SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models

The rapid growth of text-to-image diffusion models has raised concerns about their potential misuse in generating harmful or unauthorized contents. To address these issues, several Concept Erasure methods have been proposed. However, most of them fail to achieve both robustness, i.e., the ability to robustly remove the target concept., and effectiveness, i.e., maintaining image quality. While few recent techniques successfully achieve these goals for NSFW concepts, none could handle narrow concepts such as copyrighted characters or celebrities. Erasing these narrow concepts is critical in addressing copyright and legal concerns. However, erasing them is challenging due to their close distances to non-target neighboring concepts, requiring finer-grained manipulation. In this paper, we introduce Subspace Mapping (SuMa), a novel method specifically designed to achieve both robustness and effectiveness in easing these narrow concepts. SuMa first derives a target subspace representing the concept to be erased and then neutralizes it by mapping it to a reference subspace that minimizes the distance between the two. This mapping ensures the target concept is robustly erased while preserving image quality. We conduct extensive experiments with SuMa across four tasks: subclass erasure, celebrity erasure, artistic style erasure, and instance erasure and compare the results with current state-of-the-art methods. Our method achieves image quality comparable to approaches focused on effectiveness, while also yielding results that are on par with methods targeting completeness.

Key Contributions

SuMa derives a target subspace representing the concept to be erased and neutralizes it by mapping it to a reference subspace that minimizes inter-subspace distance, enabling robust erasure without degrading neighboring concepts.
Addresses narrow concept erasure (celebrities, copyrighted characters) which existing methods fail on due to close proximity of target and non-target concepts in embedding space.
Demonstrates balance between robustness (completeness of erasure under adversarial prompts) and effectiveness (preserving image quality) across four erasure tasks: subclass, celebrity, artistic style, and instance erasure.

🛡️ Threat Analysis

Output Integrity Attack

Concept erasure directly addresses output integrity of generative models — the defense prevents diffusion models from producing harmful, unauthorized, or copyright-infringing outputs. The robustness criterion specifically evaluates resistance to adversarial prompts trying to regenerate erased concepts, making this a content-integrity defense for AI-generated outputs.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxtraining_time

Applications

2025 0 cit.

Output Integrity Attack

85%

SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Anti-Tamper Protection for Unauthorized Individual Image Generation

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification

Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models

StyleProtect: Safeguarding Artistic Identity in Fine-tuned Diffusion Models

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models