defense 2025

PVMark: Enabling Public Verifiability for LLM Watermarking Schemes

Haohua Duan 1, Liyao Xiang 1, Xin Zhang 2

0 citations · 42 references · arXiv

α

Published on arXiv

2510.26274

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PVMark enables publicly verifiable watermark detection across state-of-the-art LLM watermarking schemes without compromising watermarking performance or disclosing the secret key

PVMark

Novel technique introduced


Watermarking schemes for large language models (LLMs) have been proposed to identify the source of the generated text, mitigating the potential threats emerged from model theft. However, current watermarking solutions hardly resolve the trust issue: the non-public watermark detection cannot prove itself faithfully conducting the detection. We observe that it is attributed to the secret key mostly used in the watermark detection -- it cannot be public, or the adversary may launch removal attacks provided the key; nor can it be private, or the watermarking detection is opaque to the public. To resolve the dilemma, we propose PVMark, a plugin based on zero-knowledge proof (ZKP), enabling the watermark detection process to be publicly verifiable by third parties without disclosing any secret key. PVMark hinges upon the proof of `correct execution' of watermark detection on which a set of ZKP constraints are built, including mapping, random number generation, comparison, and summation. We implement multiple variants of PVMark in Python, Rust and Circom, covering combinations of three watermarking schemes, three hash functions, and four ZKP protocols, to show our approach effectively works under a variety of circumstances. By experimental results, PVMark efficiently enables public verifiability on the state-of-the-art LLM watermarking schemes yet without compromising the watermarking performance, promising to be deployed in practice.


Key Contributions

  • Identifies the public-verifiability gap in existing LLM watermarking: the secret key cannot be public (enabling removal attacks) nor fully private (making detection opaque/untrustworthy)
  • Proposes PVMark, a ZKP-based plugin that proves 'correct execution' of watermark detection (mapping, RNG, comparison, summation) without disclosing the secret key
  • Implements multiple variants in Python, Rust, and Circom covering three watermarking schemes, three hash functions, and four ZKP protocols, demonstrating broad compatibility without degrading watermarking performance

🛡️ Threat Analysis

Output Integrity Attack

PVMark strengthens LLM text output watermarking by making the detection process publicly verifiable — the watermark is embedded in generated TEXT OUTPUTS to trace provenance, fitting ML09's content watermarking and output integrity scope. The core problem addressed (secret-key dilemma in watermark detection) and solution (ZKP-based verifiable detection) are squarely about authenticating and verifying the integrity of model outputs.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
llm-generated text watermarkingcontent provenance verificationai text attribution