defense arXiv Sep 25, 2025 · Sep 2025
Jiahao Huo, Shuliang Liu, Bin Wang et al. · The Hong Kong University of Science and Technology · Peking University +2 more
Distortion-free semantic-level LLM text watermarking using proxy functions and multi-channel constraints, robust against paraphrasing attacks
Output Integrity Attack nlp
Semantic-level watermarking (SWM) for large language models (LLMs) enhances watermarking robustness against text modifications and paraphrasing attacks by treating the sentence as the fundamental unit. However, existing methods still lack strong theoretical guarantees of robustness, and reject-sampling-based generation often introduces significant distribution distortions compared with unwatermarked outputs. In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) $\unicode{x2013}$ functions that map sentences to scalar values. Building on this framework, we propose PMark, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which we call channels) to strengthen watermark evidence. Equipped with solid theoretical guarantees, PMark achieves the desired distortion-free property and improves the robustness against paraphrasing-style attacks. We also provide an empirically optimized version that further removes the requirement for dynamical median estimation for better sampling efficiency. Experimental results show that PMark consistently outperforms existing SWM baselines in both text quality and robustness, offering a more effective paradigm for detecting machine-generated text. Our code will be released at [this URL](https://github.com/PMark-repo/PMark).
llm transformer The Hong Kong University of Science and Technology · Peking University · National University of Singapore +1 more
defense arXiv Sep 27, 2025 · Sep 2025
Yu Zhang, Shuliang Liu, Xu Yang et al. · The Hong Kong University of Science and Technology (Guangzhou) · South China University of Technology
Proposes dynamic LLM text watermarking using context-aware entropy thresholds to preserve quality across mixed-modality generation tasks
Output Integrity Attack nlp
Watermarking algorithms for Large Language Models (LLMs) effectively identify machine-generated content by embedding and detecting hidden statistical features in text. However, such embedding leads to a decline in text quality, especially in low-entropy scenarios where performance needs improvement. Existing methods that rely on entropy thresholds often require significant computational resources for tuning and demonstrate poor adaptability to unknown or cross-task generation scenarios. We propose \textbf{C}ontext-\textbf{A}ware \textbf{T}hreshold watermarking ($\myalgo$), a novel framework that dynamically adjusts watermarking intensity based on real-time semantic context. $\myalgo$ partitions text generation into semantic states using logits clustering, establishing context-aware entropy thresholds that preserve fidelity in structured content while embedding robust watermarks. Crucially, it requires no pre-defined thresholds or task-specific tuning. Experiments show $\myalgo$ improves text quality in cross-tasks without sacrificing detection accuracy.
llm transformer The Hong Kong University of Science and Technology (Guangzhou) · South China University of Technology
defense arXiv Jan 12, 2026 · 12w ago
Qi Zheng, Shuliang Liu, Yu Huang et al. · The Hong Kong University of Science and Technology (Guangzhou) · The Hong Kong University of Science and Technology +1 more
Watermarks VLM-generated text via visual-evidence-guided token partitioning, improving visual fidelity while maintaining 96.88% AUC detection accuracy
Output Integrity Attack nlpmultimodal
Watermarking has emerged as a pivotal solution for content traceability and intellectual property protection in Large Vision-Language Models (LVLMs). However, vision-agnostic watermarks introduce visually irrelevant tokens and disrupt visual grounding by enforcing indiscriminate pseudo-random biases, while some semantic-aware methods incur prohibitive inference latency due to rejection sampling. In this paper, we propose the VIsual Semantic Adaptive Watermark (VISA-Mark), a novel framework that embeds detectable signals while strictly preserving visual fidelity. Our approach employs a lightweight, efficiently trained prefix-tuner to extract dynamic Visual-Evidence Weights, which quantify the evidentiary support for candidate tokens based on the visual input. These weights guide an adaptive vocabulary partitioning and logits perturbation mechanism, concentrating watermark strength specifically on visually-supported tokens. By actively aligning the watermark with visual evidence, VISA-Mark effectively maintains visual fidelity. Empirical results confirm that VISA-Mark outperforms conventional methods with a 7.8% improvement in visual consistency (Chair-I) and superior semantic fidelity. The framework maintains highly competitive detection accuracy (96.88% AUC) and robust attack resilience (99.3%) without sacrificing inference efficiency, effectively establishing a new standard for reliability-preserving multimodal watermarking.
vlm transformer llm The Hong Kong University of Science and Technology (Guangzhou) · The Hong Kong University of Science and Technology · Zhejiang University