Latent Diffusion Inversion Requires Understanding the Latent Space
Mingxing Rao , Bowen Qu , Daniel Moyer
Published on arXiv
2511.20592
Membership Inference Attack
OWASP ML Top 10 — ML04
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
Removing low-memorization latent dimensions when computing MIA attack statistics yields average AUROC gains of 2.7% and TPR@1%FPR gains of 6.42% on LDMs across six datasets.
The recovery of training data from generative models (``model inversion'') has been extensively studied for diffusion models in the data domain. The encoder/decoder pair and corresponding latent codes have largely been ignored by inversion techniques applied to latent space generative models, e.g., Latent Diffusion models (LDMs). In this work we describe two key findings: (1) The diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric. (2) Even within a single latent code, different dimensions contribute unequally to memorization. We introduce a principled method to rank latent dimensions by their per-dimensional contribution to the decoder pullback metric, identifying those most responsible for memorization. Empirically, removing less-memorizing dimensions when computing attack statistics for score-based membership inference attacker significantly improves performance, with average AUROC gains of 2.7\% and substantial increases in TPR@1\%FPR (6.42\%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokémon, MS-COCO, and Flickr. This indicates stronger confidence in identifying members under extremely low false-positive tolerance. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.
Key Contributions
- Discovers that LDMs exhibit non-uniform memorization across latent codes, concentrated in high-distortion regions of the decoder pullback metric
- Introduces a principled per-dimensional ranking of latent codes by their contribution to the decoder pullback metric to identify memorization-responsible dimensions
- Demonstrates that filtering low-memorization dimensions when computing MIA statistics improves AUROC by 2.7% and TPR@1%FPR by 6.42% across six diverse datasets
🛡️ Threat Analysis
The paper explicitly frames its goal as 'recovery of training data from generative models (model inversion)' and analyzes training data memorization patterns in LDMs, directly addressing training data leakage from generative models.
The paper's primary practical contribution is improving score-based membership inference attacks on LDMs, measured via AUROC (+2.7%) and TPR@1%FPR (+6.42%) — the canonical MIA evaluation protocol.