Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability

Diffusion-based image generative models produce high-fidelity images through iterative denoising but remain vulnerable to memorization, where they unintentionally reproduce exact copies or parts of training images. Recent memorization detection methods are primarily based on the norm of score difference as indicators of memorization. We prove that such norm-based metrics are mainly effective under the assumption of isotropic log-probability distributions, which generally holds at high or medium noise levels. In contrast, analyzing the anisotropic regime reveals that memorized samples exhibit strong angular alignment between the guidance vector and unconditional scores in the low-noise setting. Through these insights, we develop a memorization detection metric by integrating isotropic norm and anisotropic alignment. Our detection metric can be computed directly on pure noise inputs via two conditional and unconditional forward passes, eliminating the need for costly denoising steps. Detection experiments on Stable Diffusion v1.4 and v2 show that our metric outperforms existing denoising-free detection methods while being at least approximately 5x faster than the previous best approach. Finally, we demonstrate the effectiveness of our approach by utilizing a mitigation strategy that adapts memorized prompts based on our developed metric. The code is available at https://github.com/rohanasthana/memorization-anisotropy .

Key Contributions

Theoretical proof that norm-based memorization metrics assume isotropic log-probability distributions, valid only at high/medium noise levels
Novel detection metric combining isotropic norm and anisotropic angular alignment of guidance and unconditional score vectors, computed without denoising steps
Mitigation strategy that adapts memorized prompts using the detection metric, demonstrated on Stable Diffusion v1.4 and v2

🛡️ Threat Analysis

Model Inversion Attack

Memorization in diffusion models is a training data reconstruction vulnerability: the model can reproduce exact copies of private training images when queried with the right prompts. The paper detects this threat via a novel metric and mitigates it, directly defending against training data extraction from model outputs — the core ML03 threat.