Latent Diffusion Inversion Requires Understanding the Latent Space

The recovery of training data from generative models (``model inversion'') has been extensively studied for diffusion models in the data domain. The encoder/decoder pair and corresponding latent codes have largely been ignored by inversion techniques applied to latent space generative models, e.g., Latent Diffusion models (LDMs). In this work we describe two key findings: (1) The diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric. (2) Even within a single latent code, different dimensions contribute unequally to memorization. We introduce a principled method to rank latent dimensions by their per-dimensional contribution to the decoder pullback metric, identifying those most responsible for memorization. Empirically, removing less-memorizing dimensions when computing attack statistics for score-based membership inference attacker significantly improves performance, with average AUROC gains of 2.7\% and substantial increases in TPR@1\%FPR (6.42\%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokémon, MS-COCO, and Flickr. This indicates stronger confidence in identifying members under extremely low false-positive tolerance. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.

Key Contributions

Discovers that LDMs exhibit non-uniform memorization across latent codes, concentrated in high-distortion regions of the decoder pullback metric
Introduces a principled per-dimensional ranking of latent codes by their contribution to the decoder pullback metric to identify memorization-responsible dimensions
Demonstrates that filtering low-memorization dimensions when computing MIA statistics improves AUROC by 2.7% and TPR@1%FPR by 6.42% across six diverse datasets

🛡️ Threat Analysis

Model Inversion Attack

The paper explicitly frames its goal as 'recovery of training data from generative models (model inversion)' and analyzes training data memorization patterns in LDMs, directly addressing training data leakage from generative models.

Membership Inference Attack

The paper's primary practical contribution is improving score-based membership inference attacks on LDMs, measured via AUROC (+2.7%) and TPR@1%FPR (+6.42%) — the canonical MIA evaluation protocol.