defense 2025

DMark: Order-Agnostic Watermarking for Diffusion Large Language Models

Linyu Wu 1, Linhao Zhong 2, Wenjie Qu 1, Yuexin Li 1, Yue Liu 1, Shengfang Zhai 2, Chunhua Shen 1, Jiaheng Zhang 1

0 citations · 32 references · arXiv

α

Published on arXiv

2510.02902

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

DMark achieves 92.0–99.5% detection rates at 1% false positive rate across multiple diffusion LLMs, compared to 49.6–71.2% for naively adapted autoregressive watermarking methods.

DMark

Novel technique introduced


Diffusion large language models (dLLMs) offer faster generation than autoregressive models while maintaining comparable quality, but existing watermarking methods fail on them due to their non-sequential decoding. Unlike autoregressive models that generate tokens left-to-right, dLLMs can finalize tokens in arbitrary order, breaking the causal design underlying traditional watermarks. We present DMark, the first watermarking framework designed specifically for dLLMs. DMark introduces three complementary strategies to restore watermark detectability: predictive watermarking uses model-predicted tokens when actual context is unavailable; bidirectional watermarking exploits both forward and backward dependencies unique to diffusion decoding; and predictive-bidirectional watermarking combines both approaches to maximize detection strength. Experiments across multiple dLLMs show that DMark achieves 92.0-99.5% detection rates at 1% false positive rate while maintaining text quality, compared to only 49.6-71.2% for naive adaptations of existing methods. DMark also demonstrates robustness against text manipulations, establishing that effective watermarking is feasible for non-autoregressive language models.


Key Contributions

  • First watermarking framework designed specifically for non-autoregressive diffusion LLMs, addressing the broken causal assumption in existing methods
  • Three complementary strategies — predictive, bidirectional, and predictive-bidirectional watermarking — that exploit diffusion decoding's unique forward/backward context
  • Achieves 92.0–99.5% detection rates at 1% FPR on multiple dLLMs versus 49.6–71.2% for naive adaptations of prior watermarks, with robustness to text manipulations

🛡️ Threat Analysis

Output Integrity Attack

DMark embeds detectable signals in text OUTPUTS of diffusion LLMs to trace content provenance and detect AI-generated text — classic output integrity / content watermarking. The watermark is in the generated content, not the model weights.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
ai-generated text detectiontext provenance trackingdiffusion language model outputs