LR-DWM: Efficient Watermarking for Diffusion Language Models
Ofek Raban 1, Ethan Fetaya 1, Gal Chechik 1,2
Published on arXiv
2601.12376
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
LR-DWM watermarks Diffusion Language Model outputs with high detectability and negligible computational and memory overhead relative to the non-watermarked baseline on an H100 GPU.
LR-DWM (Left-Right Diffusion Watermarking)
Novel technique introduced
Watermarking (WM) is a critical mechanism for detecting and attributing AI-generated content. Current WM methods for Large Language Models (LLMs) are predominantly tailored for autoregressive (AR) models: They rely on tokens being generated sequentially, and embed stable signals within the generated sequence based on the previously sampled text. Diffusion Language Models (DLMs) generate text via non-sequential iterative denoising, which requires significant modification to use WM methods designed for AR models. Recent work proposed to watermark DLMs by inverting the process when needed, but suffers significant computational or memory overhead. We introduce Left-Right Diffusion Watermarking (LR-DWM), a scheme that biases the generated token based on both left and right neighbors, when they are available. LR-DWM incurs minimal runtime and memory overhead, remaining close to the non-watermarked baseline DLM while enabling reliable statistical detection under standard evaluation settings. Our results demonstrate that DLMs can be watermarked efficiently, achieving high detectability with negligible computational and memory overhead.
Key Contributions
- LR-DWM: a novel order-agnostic watermarking method for Diffusion Language Models that biases token logits using both left and right neighboring tokens when available
- Achieves negligible runtime and memory overhead compared to the non-watermarked DLM baseline, unlike prior WM-DLM and DMARK approaches
- Demonstrates that reliable statistical watermark detection is achievable for non-sequential diffusion decoding without large computational cost
🛡️ Threat Analysis
Embeds statistical watermark signals in DLM-generated text outputs to enable detection and attribution of AI-generated content — this is content provenance watermarking, not model IP watermarking (ML05). The watermark is in the output text, not model weights.