Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects
Xiaoyu Luo , Wenrui Yu , Qiongxiu Li , Johannes Bjerva
Published on arXiv
2603.02333
Model Inversion Attack
OWASP ML Top 10 — ML03
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
DLMs exhibit substantially lower PII memorization leakage than autoregressive LMs under aligned evaluation, and sampling resolution monotonically controls extraction probability — finer-grained denoising steps strictly increase the likelihood of verbatim training data recovery
Autoregressive language models (ARMs) have been shown to memorize and occasionally reproduce training data verbatim, raising concerns about privacy and copyright liability. Diffusion language models (DLMs) have recently emerged as a competitive alternative, yet their memorization behavior remains largely unexplored due to fundamental differences in generation dynamics. To address this gap, we present a systematic theoretical and empirical characterization of memorization in DLMs. We propose a generalized probabilistic extraction framework that unifies prefix-conditioned decoding and diffusion-based generation under arbitrary masking patterns and stochastic sampling trajectories. Theorem 4.3 establishes a monotonic relationship between sampling resolution and memorization: increasing resolution strictly increases the probability of exact training data extraction, implying that autoregressive decoding corresponds to a limiting case of diffusion-based generation by setting the sampling resolution maximal. Extensive experiments across model scales and sampling strategies validate our theoretical predictions. Under aligned prefix-conditioned evaluations, we further demonstrate that DLMs exhibit substantially lower memorization-based leakage of personally identifiable information (PII) compared to ARMs.
Key Contributions
- Generalized probabilistic extraction framework that unifies prefix-conditioned decoding and diffusion-based generation for measuring discoverable memorization under arbitrary masking patterns and stochastic sampling trajectories
- Theoretical proof (Theorem 4.3) that increasing sampling resolution monotonically and strictly increases the probability of exact training data extraction in DLMs, with autoregressive decoding as the maximum-resolution limiting case
- Aligned empirical comparison demonstrating DLMs exhibit substantially lower PII memorization leakage than autoregressive LMs at matched model scales and prefix-conditioned evaluation settings
🛡️ Threat Analysis
Proposes a generalized probabilistic extraction framework specifically to recover verbatim training data (including PII) from diffusion language models — a training data reconstruction attack adapted from ARM extraction methodology to the DLM bidirectional denoising paradigm, with an adversary querying the model to reconstruct memorized content.