LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models

Concept erasure aims to suppress sensitive content in diffusion models, but recent studies show that erased concepts can still be reawakened, revealing vulnerabilities in erasure methods. Existing reawakening methods mainly rely on prompt-level optimization to manipulate sampling trajectories, neglecting other generative factors, which limits a comprehensive understanding of the underlying dynamics. In this paper, we model the generation process as an implicit function to enable a comprehensive theoretical analysis of multiple factors, including text conditions, model parameters, and latent states. We theoretically show that perturbing each factor can reawaken erased concepts. Building on this insight, we propose a novel concept reawakening method: Latent space Unblocking for concept REawakening (LURE), which reawakens erased concepts by reconstructing the latent space and guiding the sampling trajectory. Specifically, our semantic re-binding mechanism reconstructs the latent space by aligning denoising predictions with target distributions to reestablish severed text-visual associations. However, in multi-concept scenarios, naive reconstruction can cause gradient conflicts and feature entanglement. To address this, we introduce Gradient Field Orthogonalization, which enforces feature orthogonality to prevent mutual interference. Additionally, our Latent Semantic Identification-Guided Sampling (LSIS) ensures stability of the reawakening process via posterior density verification. Extensive experiments demonstrate that LURE enables simultaneous, high-fidelity reawakening of multiple erased concepts across diverse erasure tasks and methods.

Key Contributions

Theoretical framework modeling diffusion text-to-image generation as an implicit function, showing that perturbing text conditions, model parameters, or latent states can each reawaken erased concepts
Semantic re-binding mechanism that reconstructs the latent space by aligning denoising predictions with target concept distributions to reestablish severed text-visual associations
Gradient Field Orthogonalization for multi-concept reawakening that enforces feature orthogonality to prevent cross-concept interference, combined with LSIS posterior density verification for stable sampling

🛡️ Threat Analysis

Input Manipulation Attack

LURE is an inference-time evasion attack that bypasses safety mechanisms (concept erasure) in diffusion models by optimizing latent representations and guiding sampling trajectories — analogous to adversarial manipulation of intermediate inputs to circumvent content safety controls.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxinference_timetargeted

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models

The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

Immunizing Images from Text to Image Editing via Adversarial Cross-Attention

Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack

CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models

Arc2Morph: Identity-Preserving Facial Morphing with Arc2Face

DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling