defense 2026

Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability

Rohan Asthana , Vasileios Belagiannis

0 citations · 37 references · arXiv

α

Published on arXiv

2601.20642

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Outperforms existing denoising-free memorization detection methods on Stable Diffusion v1.4 and v2 while being at least ~5x faster than the previous best approach.

Anisotropy-based Memorization Detection

Novel technique introduced


Diffusion-based image generative models produce high-fidelity images through iterative denoising but remain vulnerable to memorization, where they unintentionally reproduce exact copies or parts of training images. Recent memorization detection methods are primarily based on the norm of score difference as indicators of memorization. We prove that such norm-based metrics are mainly effective under the assumption of isotropic log-probability distributions, which generally holds at high or medium noise levels. In contrast, analyzing the anisotropic regime reveals that memorized samples exhibit strong angular alignment between the guidance vector and unconditional scores in the low-noise setting. Through these insights, we develop a memorization detection metric by integrating isotropic norm and anisotropic alignment. Our detection metric can be computed directly on pure noise inputs via two conditional and unconditional forward passes, eliminating the need for costly denoising steps. Detection experiments on Stable Diffusion v1.4 and v2 show that our metric outperforms existing denoising-free detection methods while being at least approximately 5x faster than the previous best approach. Finally, we demonstrate the effectiveness of our approach by utilizing a mitigation strategy that adapts memorized prompts based on our developed metric. The code is available at https://github.com/rohanasthana/memorization-anisotropy .


Key Contributions

  • Theoretical proof that norm-based memorization metrics assume isotropic log-probability distributions, valid only at high/medium noise levels
  • Novel detection metric combining isotropic norm and anisotropic angular alignment of guidance and unconditional score vectors, computed without denoising steps
  • Mitigation strategy that adapts memorized prompts using the detection metric, demonstrated on Stable Diffusion v1.4 and v2

🛡️ Threat Analysis

Model Inversion Attack

Memorization in diffusion models is a training data reconstruction vulnerability: the model can reproduce exact copies of private training images when queried with the right prompts. The paper detects this threat via a novel metric and mitigates it, directly defending against training data extraction from model outputs — the core ML03 threat.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timeinference_timeblack_box
Datasets
Stable Diffusion v1.4 training dataStable Diffusion v2 training data
Applications
text-to-image generationimage synthesis