defense 2025

PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints

Jiahao Huo 1, Shuliang Liu 1,1, Bin Wang 2, Junyan Zhang 1,3, Yibo Yan 1,1, Aiwei Liu 4, Xuming Hu 1,1, Mingxun Zhou 1

4 citations · 56 references · arXiv

α

Published on arXiv

2509.21057

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PMark consistently outperforms existing semantic-level watermarking baselines in both text quality (distortion-free generation) and robustness against paraphrasing attacks across experimental evaluations.

PMark

Novel technique introduced


Semantic-level watermarking (SWM) for large language models (LLMs) enhances watermarking robustness against text modifications and paraphrasing attacks by treating the sentence as the fundamental unit. However, existing methods still lack strong theoretical guarantees of robustness, and reject-sampling-based generation often introduces significant distribution distortions compared with unwatermarked outputs. In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) $\unicode{x2013}$ functions that map sentences to scalar values. Building on this framework, we propose PMark, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which we call channels) to strengthen watermark evidence. Equipped with solid theoretical guarantees, PMark achieves the desired distortion-free property and improves the robustness against paraphrasing-style attacks. We also provide an empirically optimized version that further removes the requirement for dynamical median estimation for better sampling efficiency. Experimental results show that PMark consistently outperforms existing SWM baselines in both text quality and robustness, offering a more effective paradigm for detecting machine-generated text. Our code will be released at [this URL](https://github.com/PMark-repo/PMark).


Key Contributions

  • Unified theoretical framework for semantic-level watermarking (SWM) via proxy functions (PFs) that map sentences to scalar values, providing the first formal treatment of this class of methods
  • PMark: a distortion-free SWM method using dynamic PF median estimation and multi-channel constraints to densify watermark evidence and improve robustness against paraphrasing attacks
  • Empirically optimized variant that removes dynamic median estimation for better sampling efficiency while retaining distortion-free and robustness guarantees

🛡️ Threat Analysis

Output Integrity Attack

PMark embeds watermarks in LLM TEXT OUTPUTS at the sentence/semantic level to detect machine-generated text and trace content provenance — this is output integrity and content watermarking, not model weight watermarking (ML05). The paper also explicitly addresses robustness against paraphrasing-style watermark removal attacks, further anchoring it in ML09.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
ai-generated text detectionllm output provenancemachine-generated content attribution