defense arXiv Apr 17, 2026 · 4w ago
Jun Li, Lizhi Xiong, Ziqiang Li et al. · Nanjing University of Information Science and Technology · Southeast University +1 more
Defends text-to-image models by erasing unsafe concepts using text-image collaboration while preserving unrelated content fidelity
Data Poisoning Attack visiongenerative
Text-to-image generative models have achieved impressive fidelity and diversity, but can inadvertently produce unsafe or undesirable content due to implicit biases embedded in large-scale training datasets. Existing concept erasure methods, whether text-only or image-assisted, face trade-offs: textual approaches often fail to fully suppress concepts, while naive image-guided methods risk over-erasing unrelated content. We propose TICoE, a text-image Collaborative Erasing framework that achieves precise and faithful concept removal through a continuous convex concept manifold and hierarchical visual representation learning. TICoE precisely removes target concepts while preserving unrelated semantic and visual content. To objectively assess the quality of erasure, we further introduce a fidelity-oriented evaluation strategy that measures post-erasure usability. Experiments on multiple benchmarks show that TICoE surpasses prior methods in concept removal precision and content fidelity, enabling safer, more controllable text-to-image generation. Our code is available at https://github.com/OpenAscent-L/TICoE.git
diffusion Nanjing University of Information Science and Technology · Southeast University · Nanjing University of Science and Technology