attack 2026

LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models

Mengyu Sun 1,2, Ziyuan Yang 1, Andrew Beng Jin Teoh 3, Junxu Liu 2, Haibo Hu 2, Yi Zhang 1

0 citations · 20 references · arXiv

α

Published on arXiv

2601.14330

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

LURE enables simultaneous, high-fidelity reawakening of multiple erased concepts across diverse erasure tasks and methods, outperforming prompt-level optimization baselines.

LURE (Latent space Unblocking for concept REawakening)

Novel technique introduced


Concept erasure aims to suppress sensitive content in diffusion models, but recent studies show that erased concepts can still be reawakened, revealing vulnerabilities in erasure methods. Existing reawakening methods mainly rely on prompt-level optimization to manipulate sampling trajectories, neglecting other generative factors, which limits a comprehensive understanding of the underlying dynamics. In this paper, we model the generation process as an implicit function to enable a comprehensive theoretical analysis of multiple factors, including text conditions, model parameters, and latent states. We theoretically show that perturbing each factor can reawaken erased concepts. Building on this insight, we propose a novel concept reawakening method: Latent space Unblocking for concept REawakening (LURE), which reawakens erased concepts by reconstructing the latent space and guiding the sampling trajectory. Specifically, our semantic re-binding mechanism reconstructs the latent space by aligning denoising predictions with target distributions to reestablish severed text-visual associations. However, in multi-concept scenarios, naive reconstruction can cause gradient conflicts and feature entanglement. To address this, we introduce Gradient Field Orthogonalization, which enforces feature orthogonality to prevent mutual interference. Additionally, our Latent Semantic Identification-Guided Sampling (LSIS) ensures stability of the reawakening process via posterior density verification. Extensive experiments demonstrate that LURE enables simultaneous, high-fidelity reawakening of multiple erased concepts across diverse erasure tasks and methods.


Key Contributions

  • Theoretical framework modeling diffusion text-to-image generation as an implicit function, showing that perturbing text conditions, model parameters, or latent states can each reawaken erased concepts
  • Semantic re-binding mechanism that reconstructs the latent space by aligning denoising predictions with target concept distributions to reestablish severed text-visual associations
  • Gradient Field Orthogonalization for multi-concept reawakening that enforces feature orthogonality to prevent cross-concept interference, combined with LSIS posterior density verification for stable sampling

🛡️ Threat Analysis

Input Manipulation Attack

LURE is an inference-time evasion attack that bypasses safety mechanisms (concept erasure) in diffusion models by optimizing latent representations and guiding sampling trajectories — analogous to adversarial manipulation of intermediate inputs to circumvent content safety controls.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
white_boxinference_timetargeted
Applications
text-to-image generationconcept erasure safety mechanisms