Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models

Concept erasure, which fine-tunes diffusion models to remove undesired or harmful visual concepts, has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.However, existing removal methods typically adopt a unidirectional erasure strategy by either suppressing the target concept or reinforcing safe alternatives, making it difficult to achieve a balanced trade-off between concept removal and generation quality. To address this limitation, we propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously. Specifically, based on the joint representation of text prompts and corresponding images, Bi-Erasing introduces two decoupled image branches: a negative branch responsible for suppressing harmful semantics and a positive branch providing visual guidance for safe alternatives. By jointly optimizing these complementary directions, our approach achieves a balance between erasure efficacy and generation usability. In addition, we apply mask-based filtering to the image branches to prevent interference from irrelevant content during the erasure process. Across extensive experiment evaluations, the proposed Bi-Erasing outperforms baseline methods in balancing concept removal effectiveness and visual fidelity.

Key Contributions

Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that simultaneously suppresses harmful semantics (negative branch) and reinforces safe visual alternatives (positive branch)
Joint representation of text prompts and corresponding images to guide concept erasure using decoupled image branches
Mask-based filtering to prevent interference from irrelevant content during the erasure process

🛡️ Threat Analysis

Output Integrity Attack

Concept erasure is a defense ensuring the integrity of generative model outputs — specifically preventing diffusion models from generating harmful, unsafe, or illegal visual content. The paper's primary contribution is a method to modify the model's behavior so its outputs do not contain undesired concepts, directly targeting output integrity of AI-generated content.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

training_time

Applications

2025 0 cit.

Output Integrity Attack

83%