Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

Autoregressive language models (ARMs) have been shown to memorize and occasionally reproduce training data verbatim, raising concerns about privacy and copyright liability. Diffusion language models (DLMs) have recently emerged as a competitive alternative, yet their memorization behavior remains largely unexplored due to fundamental differences in generation dynamics. To address this gap, we present a systematic theoretical and empirical characterization of memorization in DLMs. We propose a generalized probabilistic extraction framework that unifies prefix-conditioned decoding and diffusion-based generation under arbitrary masking patterns and stochastic sampling trajectories. Theorem 4.3 establishes a monotonic relationship between sampling resolution and memorization: increasing resolution strictly increases the probability of exact training data extraction, implying that autoregressive decoding corresponds to a limiting case of diffusion-based generation by setting the sampling resolution maximal. Extensive experiments across model scales and sampling strategies validate our theoretical predictions. Under aligned prefix-conditioned evaluations, we further demonstrate that DLMs exhibit substantially lower memorization-based leakage of personally identifiable information (PII) compared to ARMs.

Key Contributions

Generalized probabilistic extraction framework that unifies prefix-conditioned decoding and diffusion-based generation for measuring discoverable memorization under arbitrary masking patterns and stochastic sampling trajectories
Theoretical proof (Theorem 4.3) that increasing sampling resolution monotonically and strictly increases the probability of exact training data extraction in DLMs, with autoregressive decoding as the maximum-resolution limiting case
Aligned empirical comparison demonstrating DLMs exhibit substantially lower PII memorization leakage than autoregressive LMs at matched model scales and prefix-conditioned evaluation settings

🛡️ Threat Analysis

Model Inversion Attack

Proposes a generalized probabilistic extraction framework specifically to recover verbatim training data (including PII) from diffusion language models — a training data reconstruction attack adapted from ARM extraction methodology to the DLM bidirectional denoising paradigm, with an adversary querying the model to reconstruct memorized content.

Details

Domains

nlpgenerative

Model Types

llmdiffusiontransformer

Threat Tags

training_timeinference_timeblack_box

Applications

2026 0 cit.

Model Inversion Attack

72%