benchmark 2026

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

Xiaoyu Luo , Wenrui Yu , Qiongxiu Li , Johannes Bjerva

0 citations

α

Published on arXiv

2603.02333

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

DLMs exhibit substantially lower PII memorization leakage than autoregressive LMs under aligned evaluation, and sampling resolution monotonically controls extraction probability — finer-grained denoising steps strictly increase the likelihood of verbatim training data recovery


Autoregressive language models (ARMs) have been shown to memorize and occasionally reproduce training data verbatim, raising concerns about privacy and copyright liability. Diffusion language models (DLMs) have recently emerged as a competitive alternative, yet their memorization behavior remains largely unexplored due to fundamental differences in generation dynamics. To address this gap, we present a systematic theoretical and empirical characterization of memorization in DLMs. We propose a generalized probabilistic extraction framework that unifies prefix-conditioned decoding and diffusion-based generation under arbitrary masking patterns and stochastic sampling trajectories. Theorem 4.3 establishes a monotonic relationship between sampling resolution and memorization: increasing resolution strictly increases the probability of exact training data extraction, implying that autoregressive decoding corresponds to a limiting case of diffusion-based generation by setting the sampling resolution maximal. Extensive experiments across model scales and sampling strategies validate our theoretical predictions. Under aligned prefix-conditioned evaluations, we further demonstrate that DLMs exhibit substantially lower memorization-based leakage of personally identifiable information (PII) compared to ARMs.


Key Contributions

  • Generalized probabilistic extraction framework that unifies prefix-conditioned decoding and diffusion-based generation for measuring discoverable memorization under arbitrary masking patterns and stochastic sampling trajectories
  • Theoretical proof (Theorem 4.3) that increasing sampling resolution monotonically and strictly increases the probability of exact training data extraction in DLMs, with autoregressive decoding as the maximum-resolution limiting case
  • Aligned empirical comparison demonstrating DLMs exhibit substantially lower PII memorization leakage than autoregressive LMs at matched model scales and prefix-conditioned evaluation settings

🛡️ Threat Analysis

Model Inversion Attack

Proposes a generalized probabilistic extraction framework specifically to recover verbatim training data (including PII) from diffusion language models — a training data reconstruction attack adapted from ARM extraction methodology to the DLM bidirectional denoising paradigm, with an adversary querying the model to reconstruct memorized content.


Details

Domains
nlpgenerative
Model Types
llmdiffusiontransformer
Threat Tags
training_timeinference_timeblack_box
Applications
language modelingtext generationpii extraction