Membership Inference Attacks Against Fine-tuned Diffusion Language Models
Yuetian Chen 1, Kaiyuan Zhang 1, Yuntao Du 1, Edoardo Stoppa 1, Charles Fleming 2, Ashish Kundu 2, Bruno Ribeiro 1, Ninghui Li 1
Published on arXiv
2601.20125
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
SAMA achieves 30% relative AUC improvement over the best baseline MIA, with up to 8x improvement at low false positive rates across nine datasets on fine-tuned DLMs.
SAMA (Subset-Aggregated Membership Attack)
Novel technique introduced
Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility to privacy leakage via Membership Inference Attacks (MIA) remains critically underexplored. This paper presents the first systematic investigation of MIA vulnerabilities in DLMs. Unlike the autoregressive models' single fixed prediction pattern, DLMs' multiple maskable configurations exponentially increase attack opportunities. This ability to probe many independent masks dramatically improves detection chances. To exploit this, we introduce SAMA (Subset-Aggregated Membership Attack), which addresses the sparse signal challenge through robust aggregation. SAMA samples masked subsets across progressive densities and applies sign-based statistics that remain effective despite heavy-tailed noise. Through inverse-weighted aggregation prioritizing sparse masks' cleaner signals, SAMA transforms sparse memorization detection into a robust voting mechanism. Experiments on nine datasets show SAMA achieves 30% relative AUC improvement over the best baseline, with up to 8 times improvement at low false positive rates. These findings reveal significant, previously unknown vulnerabilities in DLMs, necessitating the development of tailored privacy defenses.
Key Contributions
- First systematic investigation of membership inference attack vulnerabilities in Diffusion Language Models (DLMs)
- SAMA (Subset-Aggregated Membership Attack): exploits DLMs' multiple maskable configurations via sign-based statistics and inverse-weighted aggregation to robustly detect membership from sparse signals
- Empirical evaluation across nine datasets showing 30% relative AUC improvement and up to 8x improvement at low false positive rates over best baselines
🛡️ Threat Analysis
SAMA is a novel membership inference attack that determines whether specific data points were in the fine-tuning set of Diffusion Language Models, achieving 30% relative AUC improvement over baselines — this is the paper's primary and sole contribution.