PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints

Semantic-level watermarking (SWM) for large language models (LLMs) enhances watermarking robustness against text modifications and paraphrasing attacks by treating the sentence as the fundamental unit. However, existing methods still lack strong theoretical guarantees of robustness, and reject-sampling-based generation often introduces significant distribution distortions compared with unwatermarked outputs. In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) $\unicode{x2013}$ functions that map sentences to scalar values. Building on this framework, we propose PMark, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which we call channels) to strengthen watermark evidence. Equipped with solid theoretical guarantees, PMark achieves the desired distortion-free property and improves the robustness against paraphrasing-style attacks. We also provide an empirically optimized version that further removes the requirement for dynamical median estimation for better sampling efficiency. Experimental results show that PMark consistently outperforms existing SWM baselines in both text quality and robustness, offering a more effective paradigm for detecting machine-generated text. Our code will be released at [this URL](https://github.com/PMark-repo/PMark).

Key Contributions

Unified theoretical framework for semantic-level watermarking (SWM) via proxy functions (PFs) that map sentences to scalar values, providing the first formal treatment of this class of methods
PMark: a distortion-free SWM method using dynamic PF median estimation and multi-channel constraints to densify watermark evidence and improve robustness against paraphrasing attacks
Empirically optimized variant that removes dynamic median estimation for better sampling efficiency while retaining distortion-free and robustness guarantees

🛡️ Threat Analysis

Output Integrity Attack

PMark embeds watermarks in LLM TEXT OUTPUTS at the sentence/semantic level to detect machine-generated text and trace content provenance — this is output integrity and content watermarking, not model weight watermarking (ML05). The paper also explicitly addresses robustness against paraphrasing-style watermark removal attacks, further anchoring it in ML09.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2025 0 cit.

Output Integrity Attack

100%