SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models
Kien Nguyen 1,2, Anh Tran 1,2, Cuong Pham 2,1
Published on arXiv
2509.05625
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
SuMa achieves image quality comparable to effectiveness-focused methods while matching robustness-focused methods in erasure completeness, uniquely handling narrow concepts like celebrities and copyrighted characters where prior methods failed to balance both properties.
SuMa (Subspace Mapping)
Novel technique introduced
The rapid growth of text-to-image diffusion models has raised concerns about their potential misuse in generating harmful or unauthorized contents. To address these issues, several Concept Erasure methods have been proposed. However, most of them fail to achieve both robustness, i.e., the ability to robustly remove the target concept., and effectiveness, i.e., maintaining image quality. While few recent techniques successfully achieve these goals for NSFW concepts, none could handle narrow concepts such as copyrighted characters or celebrities. Erasing these narrow concepts is critical in addressing copyright and legal concerns. However, erasing them is challenging due to their close distances to non-target neighboring concepts, requiring finer-grained manipulation. In this paper, we introduce Subspace Mapping (SuMa), a novel method specifically designed to achieve both robustness and effectiveness in easing these narrow concepts. SuMa first derives a target subspace representing the concept to be erased and then neutralizes it by mapping it to a reference subspace that minimizes the distance between the two. This mapping ensures the target concept is robustly erased while preserving image quality. We conduct extensive experiments with SuMa across four tasks: subclass erasure, celebrity erasure, artistic style erasure, and instance erasure and compare the results with current state-of-the-art methods. Our method achieves image quality comparable to approaches focused on effectiveness, while also yielding results that are on par with methods targeting completeness.
Key Contributions
- SuMa derives a target subspace representing the concept to be erased and neutralizes it by mapping it to a reference subspace that minimizes inter-subspace distance, enabling robust erasure without degrading neighboring concepts.
- Addresses narrow concept erasure (celebrities, copyrighted characters) which existing methods fail on due to close proximity of target and non-target concepts in embedding space.
- Demonstrates balance between robustness (completeness of erasure under adversarial prompts) and effectiveness (preserving image quality) across four erasure tasks: subclass, celebrity, artistic style, and instance erasure.
🛡️ Threat Analysis
Concept erasure directly addresses output integrity of generative models — the defense prevents diffusion models from producing harmful, unauthorized, or copyright-infringing outputs. The robustness criterion specifically evaluates resistance to adversarial prompts trying to regenerate erased concepts, making this a content-integrity defense for AI-generated outputs.