defense 2026

Beyond Text Prompts: Precise Concept Erasure through Text-Image Collaboration

Jun Li 1, Lizhi Xiong 1, Ziqiang Li 1, Weiwei Jiang 1, Zhangjie Fu 1, Yong Li 2, Guo-Sen Xie 3

0 citations

α

Published on arXiv

2604.15829

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Achieves 0% ASR and 2% UDA on gun erasure task while maintaining 92.06% MCP for related concepts, outperforming prior methods

TICoE

Novel technique introduced


Text-to-image generative models have achieved impressive fidelity and diversity, but can inadvertently produce unsafe or undesirable content due to implicit biases embedded in large-scale training datasets. Existing concept erasure methods, whether text-only or image-assisted, face trade-offs: textual approaches often fail to fully suppress concepts, while naive image-guided methods risk over-erasing unrelated content. We propose TICoE, a text-image Collaborative Erasing framework that achieves precise and faithful concept removal through a continuous convex concept manifold and hierarchical visual representation learning. TICoE precisely removes target concepts while preserving unrelated semantic and visual content. To objectively assess the quality of erasure, we further introduce a fidelity-oriented evaluation strategy that measures post-erasure usability. Experiments on multiple benchmarks show that TICoE surpasses prior methods in concept removal precision and content fidelity, enabling safer, more controllable text-to-image generation. Our code is available at https://github.com/OpenAscent-L/TICoE.git


Key Contributions

  • TICoE framework using text-image collaboration with continuous convex concept manifold for precise concept removal
  • Hierarchical visual representation learning that preserves unrelated semantic and visual content during erasure
  • Fidelity-oriented evaluation strategy measuring post-erasure model usability and content preservation

🛡️ Threat Analysis

Data Poisoning Attack

The paper addresses concept erasure to mitigate implicit biases and unsafe content embedded in training data of generative models. This is a defense against data-level contamination that causes models to generate undesirable outputs. The threat model involves corrupted/biased training data leading to unsafe generation.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_time
Datasets
COCO-10kinappropriate image prompt dataset
Applications
text-to-image generationcontent safety