benchmark 2025

Demystifying Foreground-Background Memorization in Diffusion Models

Jimmy Z. Di 1,2, Yiwei Lu 3, Yaoliang Yu 1,2, Gautam Kamath 4, Adam Dziedzic 1,2, Franziska Boenisch 4

0 citations

α

Published on arXiv

2508.12148

Model Inversion Attack

OWASP ML Top 10 — ML03

Key Finding

Existing mitigation methods such as neuron deactivation and pruning fail to eliminate foreground memorization in diffusion models, and memorization extends beyond one-to-one prompt-image pairs to clusters of similar training images.

FB-Mem

Novel technique introduced


Diffusion models (DMs) memorize training images and can reproduce near-duplicates during generation. Current detection methods identify verbatim memorization but fail to capture two critical aspects: quantifying partial memorization occurring in small image regions, and memorization patterns beyond specific prompt-image pairs. To address these limitations, we propose Foreground Background Memorization (FB-Mem), a novel segmentation-based metric that classifies and quantifies memorized regions within generated images. Our method reveals that memorization is more pervasive than previously understood: (1) individual generations from single prompts may be linked to clusters of similar training images, revealing complex memorization patterns that extend beyond one-to-one correspondences; and (2) existing model-level mitigation methods, such as neuron deactivation and pruning, fail to eliminate local memorization, which persists particularly in foreground regions. Our work establishes an effective framework for measuring memorization in diffusion models, demonstrates the inadequacy of current mitigation approaches, and proposes a stronger mitigation method using a clustering approach.


Key Contributions

  • FB-Mem: a segmentation-based metric that classifies and quantifies memorized foreground vs. background regions in diffusion model outputs, enabling fine-grained partial memorization measurement
  • Empirical finding that memorization is linked to clusters of similar training images (not one-to-one prompt-image correspondences), revealing more complex and pervasive memorization than previously understood
  • Demonstration that state-of-the-art mitigations (neuron deactivation, pruning) fail to eliminate local memorization in foreground regions, plus a stronger clustering-based mitigation approach

🛡️ Threat Analysis

Model Inversion Attack

The core concern is training data leakage: diffusion models reproduce near-duplicate training images during generation, effectively reconstructing private training data. The paper measures the extent of this memorization via a novel metric, demonstrates that existing mitigations (neuron deactivation, pruning) fail to prevent data reproduction, and proposes a stronger mitigation. This maps to the training data reconstruction threat in ML03, where the 'adversary' is any user who can prompt the model and observe reproduced training images.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timeinference_time
Applications
image generationtext-to-image synthesis