PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints
Jiahao Huo 1, Shuliang Liu 1,1, Bin Wang 2, Junyan Zhang 1,3, Yibo Yan 1,1, Aiwei Liu 4, Xuming Hu 1,1, Mingxun Zhou 1
Published on arXiv
2509.21057
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
PMark consistently outperforms existing semantic-level watermarking baselines in both text quality (distortion-free generation) and robustness against paraphrasing attacks across experimental evaluations.
PMark
Novel technique introduced
Semantic-level watermarking (SWM) for large language models (LLMs) enhances watermarking robustness against text modifications and paraphrasing attacks by treating the sentence as the fundamental unit. However, existing methods still lack strong theoretical guarantees of robustness, and reject-sampling-based generation often introduces significant distribution distortions compared with unwatermarked outputs. In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) $\unicode{x2013}$ functions that map sentences to scalar values. Building on this framework, we propose PMark, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which we call channels) to strengthen watermark evidence. Equipped with solid theoretical guarantees, PMark achieves the desired distortion-free property and improves the robustness against paraphrasing-style attacks. We also provide an empirically optimized version that further removes the requirement for dynamical median estimation for better sampling efficiency. Experimental results show that PMark consistently outperforms existing SWM baselines in both text quality and robustness, offering a more effective paradigm for detecting machine-generated text. Our code will be released at [this URL](https://github.com/PMark-repo/PMark).
Key Contributions
- Unified theoretical framework for semantic-level watermarking (SWM) via proxy functions (PFs) that map sentences to scalar values, providing the first formal treatment of this class of methods
- PMark: a distortion-free SWM method using dynamic PF median estimation and multi-channel constraints to densify watermark evidence and improve robustness against paraphrasing attacks
- Empirically optimized variant that removes dynamic median estimation for better sampling efficiency while retaining distortion-free and robustness guarantees
🛡️ Threat Analysis
PMark embeds watermarks in LLM TEXT OUTPUTS at the sentence/semantic level to detect machine-generated text and trace content provenance — this is output integrity and content watermarking, not model weight watermarking (ML05). The paper also explicitly addresses robustness against paraphrasing-style watermark removal attacks, further anchoring it in ML09.