LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models
Mengyu Sun 1,2, Ziyuan Yang 1, Andrew Beng Jin Teoh 3, Junxu Liu 2, Haibo Hu 2, Yi Zhang 1
Published on arXiv
2601.14330
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
LURE enables simultaneous, high-fidelity reawakening of multiple erased concepts across diverse erasure tasks and methods, outperforming prompt-level optimization baselines.
LURE (Latent space Unblocking for concept REawakening)
Novel technique introduced
Concept erasure aims to suppress sensitive content in diffusion models, but recent studies show that erased concepts can still be reawakened, revealing vulnerabilities in erasure methods. Existing reawakening methods mainly rely on prompt-level optimization to manipulate sampling trajectories, neglecting other generative factors, which limits a comprehensive understanding of the underlying dynamics. In this paper, we model the generation process as an implicit function to enable a comprehensive theoretical analysis of multiple factors, including text conditions, model parameters, and latent states. We theoretically show that perturbing each factor can reawaken erased concepts. Building on this insight, we propose a novel concept reawakening method: Latent space Unblocking for concept REawakening (LURE), which reawakens erased concepts by reconstructing the latent space and guiding the sampling trajectory. Specifically, our semantic re-binding mechanism reconstructs the latent space by aligning denoising predictions with target distributions to reestablish severed text-visual associations. However, in multi-concept scenarios, naive reconstruction can cause gradient conflicts and feature entanglement. To address this, we introduce Gradient Field Orthogonalization, which enforces feature orthogonality to prevent mutual interference. Additionally, our Latent Semantic Identification-Guided Sampling (LSIS) ensures stability of the reawakening process via posterior density verification. Extensive experiments demonstrate that LURE enables simultaneous, high-fidelity reawakening of multiple erased concepts across diverse erasure tasks and methods.
Key Contributions
- Theoretical framework modeling diffusion text-to-image generation as an implicit function, showing that perturbing text conditions, model parameters, or latent states can each reawaken erased concepts
- Semantic re-binding mechanism that reconstructs the latent space by aligning denoising predictions with target concept distributions to reestablish severed text-visual associations
- Gradient Field Orthogonalization for multi-concept reawakening that enforces feature orthogonality to prevent cross-concept interference, combined with LSIS posterior density verification for stable sampling
🛡️ Threat Analysis
LURE is an inference-time evasion attack that bypasses safety mechanisms (concept erasure) in diffusion models by optimizing latent representations and guiding sampling trajectories — analogous to adversarial manipulation of intermediate inputs to circumvent content safety controls.