LR-DWM: Efficient Watermarking for Diffusion Language Models

Watermarking (WM) is a critical mechanism for detecting and attributing AI-generated content. Current WM methods for Large Language Models (LLMs) are predominantly tailored for autoregressive (AR) models: They rely on tokens being generated sequentially, and embed stable signals within the generated sequence based on the previously sampled text. Diffusion Language Models (DLMs) generate text via non-sequential iterative denoising, which requires significant modification to use WM methods designed for AR models. Recent work proposed to watermark DLMs by inverting the process when needed, but suffers significant computational or memory overhead. We introduce Left-Right Diffusion Watermarking (LR-DWM), a scheme that biases the generated token based on both left and right neighbors, when they are available. LR-DWM incurs minimal runtime and memory overhead, remaining close to the non-watermarked baseline DLM while enabling reliable statistical detection under standard evaluation settings. Our results demonstrate that DLMs can be watermarked efficiently, achieving high detectability with negligible computational and memory overhead.

Key Contributions

LR-DWM: a novel order-agnostic watermarking method for Diffusion Language Models that biases token logits using both left and right neighboring tokens when available
Achieves negligible runtime and memory overhead compared to the non-watermarked DLM baseline, unlike prior WM-DLM and DMARK approaches
Demonstrates that reliable statistical watermark detection is achievable for non-sequential diffusion decoding without large computational cost

🛡️ Threat Analysis

Output Integrity Attack

Embeds statistical watermark signals in DLM-generated text outputs to enable detection and attribution of AI-generated content — this is content provenance watermarking, not model IP watermarking (ML05). The watermark is in the output text, not model weights.