attack 2026

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

0 citations

Published on arXiv

2603.17828

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Successfully regenerates erased concepts from diffusion models treated with state-of-the-art unlearning methods, proving current erasure techniques fail to remove underlying visual knowledge

TINA

Novel technique introduced

Although text-to-image diffusion models exhibit remarkable generative power, concept erasure techniques are essential for their safe deployment to prevent the creation of harmful content. This has fostered a dynamic interplay between the development of erasure defenses and the adversarial probes designed to bypass them, and this co-evolution has progressively enhanced the efficacy of erasure methods. However, this adversarial co-evolution has converged on a narrow, text-centric paradigm that equates erasure with severing the text-to-image mapping, ignoring that the underlying visual knowledge related to undesired concepts still persist. To substantiate this claim, we investigate from a visual perspective, leveraging DDIM inversion to probe whether a generative pathway for the erased concept can still be found. However, identifying such a visual generative pathway is challenging because standard text-guided DDIM inversion is actively resisted by text-centric defenses within the erased model. To address this, we introduce TINA, a novel Text-free INversion Attack, which enforces this visual-only probe by operating under a null-text condition, thereby avoiding existing text-centric defenses. Moreover, TINA integrates an optimization procedure to overcome the accumulating approximation errors that arise when standard inversion operates without its usual textual guidance. Our experiments demonstrate that TINA regenerates erased concepts from models treated with state-of-the-art unlearning. The success of TINA proves that current methods merely obscure concepts, highlighting an urgent need for paradigms that operate directly on internal visual knowledge.

Key Contributions

Text-free DDIM inversion attack that bypasses text-centric concept erasure defenses by operating under null-text conditions
Optimization procedure to overcome approximation errors in text-free inversion
Demonstrates that state-of-the-art unlearning methods only obscure concepts rather than truly removing visual knowledge

🛡️ Threat Analysis

Model Inversion Attack

TINA reconstructs visual knowledge that was supposedly removed by unlearning methods, demonstrating that the model still retains training data concepts that erasure defenses claimed to eliminate. This is a model inversion attack that recovers private/removed visual knowledge from the model.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

inference_timewhite_box

Applications

text-to-image generationconcept erasure evaluation

Read PDF arXiv

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Prompt Pirates Need a Map: Stealing Seeds helps Stealing Prompts

Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models

Demystifying Foreground-Background Memorization in Diffusion Models

How Diffusion Models Memorize

Generative Model Inversion Through the Lens of the Manifold Hypothesis

Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models

Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability

ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation