PRIVMARK: Private Large Language Models Watermarking with MPC

The rapid growth of Large Language Models (LLMs) has highlighted the pressing need for reliable mechanisms to verify content ownership and ensure traceability. Watermarking offers a promising path forward, but it remains limited by privacy concerns in sensitive scenarios, as traditional approaches often require direct access to a model's parameters or its training data. In this work, we propose a secure multi-party computation (MPC)-based private LLMs watermarking framework, PRIVMARK, to address the concerns. Concretely, we investigate PostMark (EMNLP'2024), one of the state-of-the-art LLMs Watermarking methods, and formulate its basic operations. Then, we construct efficient protocols for these operations using the MPC primitives in a black-box manner. In this way, PRIVMARK enables multiple parties to collaboratively watermark an LLM's output without exposing the model's weights to any single computing party. We implement PRIVMARK using SecretFlow-SPU (USENIX ATC'2023) and evaluate its performance using the ABY3 (CCS'2018) backend. The experimental results show that PRIVMARK achieves semantically identical results compared to the plaintext baseline without MPC and is resistant against paraphrasing and removing attacks with reasonable efficiency.

Key Contributions

Formalizes PostMark's operations and constructs efficient MPC-compatible protocols for each primitive in a black-box manner using SecretFlow-SPU and ABY3
Enables multiple parties to collaboratively watermark LLM outputs without any single party seeing model weights, addressing privacy in multi-party LLM deployments
Demonstrates that PRIVMARK achieves semantically identical watermarking quality to the plaintext baseline while resisting paraphrasing and watermark removal attacks

🛡️ Threat Analysis

Output Integrity Attack

Embeds watermarks in LLM-generated text outputs for content provenance and traceability; the watermark is in the text OUTPUT (not model weights), and the paper explicitly evaluates resistance to watermark removal and paraphrasing attacks — squarely output integrity and content authentication.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

2026 0 cit.

Output Integrity Attack

100%

PRIVMARK: Private Large Language Models Watermarking with MPC

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Every Language Model Has a Forgery-Resistant Signature

SimKey: A Semantically Aware Key Module for Watermarking Language Models

SENTRA: Selected-Next-Token Transformer for LLM Text Detection

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

Black-box Detection of LLM-generated Text Using Generalized Jensen-Shannon Divergence

SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Variation is the Key: A Variation-Based Framework for LLM-Generated Text Detection