benchmark 2025

How Diffusion Models Memorize

Juyeop Kim , Songkuk Kim , Jong-Seok Lee

4 citations · 31 references · arXiv

α

Published on arXiv

2509.25705

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Deviations of intermediate latents from the theoretical denoising schedule correlate almost perfectly with memorization severity, identifying early overestimation as the central causal mechanism.

Latent trajectory decomposition analysis

Novel technique introduced


Despite their success in image generation, diffusion models can memorize training data, raising serious privacy and copyright concerns. Although prior work has sought to characterize, detect, and mitigate memorization, the fundamental question of why and how it occurs remains unresolved. In this paper, we revisit the diffusion and denoising process and analyze latent space dynamics to address the question: "How do diffusion models memorize?" We show that memorization is driven by the overestimation of training samples during early denoising, which reduces diversity, collapses denoising trajectories, and accelerates convergence toward the memorized image. Specifically: (i) memorization cannot be explained by overfitting alone, as training loss is larger under memorization due to classifier-free guidance amplifying predictions and inducing overestimation; (ii) memorized prompts inject training images into noise predictions, forcing latent trajectories to converge and steering denoising toward their paired samples; and (iii) a decomposition of intermediate latents reveals how initial randomness is quickly suppressed and replaced by memorized content, with deviations from the theoretical denoising schedule correlating almost perfectly with memorization severity. Together, these results identify early overestimation as the central underlying mechanism of memorization in diffusion models.


Key Contributions

  • Identifies early denoising overestimation (amplified by classifier-free guidance) as the central mechanism of memorization, not mere overfitting
  • Shows memorized prompts inject training images into noise predictions, collapsing latent trajectories toward memorized samples
  • Demonstrates that latent deviations from the theoretical denoising schedule correlate almost perfectly with memorization severity

🛡️ Threat Analysis

Model Inversion Attack

The paper studies training data memorization in diffusion models — the phenomenon by which private training images can be reconstructed from model outputs. Understanding the latent-space mechanism of memorization (overestimation during early denoising, trajectory collapse toward memorized images) directly informs both attacks (training data extraction) and defenses against model inversion and data reconstruction attacks.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timewhite_box
Applications
image generationtext-to-image synthesis