Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models

Watermarking is a principled approach for tracing the provenance of large language model (LLM) outputs, but its deployment in practice is hindered by inference inefficiency. Speculative sampling accelerates inference, with efficiency improving as the acceptance rate between draft and target models increases. Yet recent work reveals a fundamental trade-off: higher watermark strength reduces acceptance, preventing their simultaneous achievement. We revisit this trade-off and show it is not absolute. We introduce a quantitative measure of watermark strength that governs statistical detectability and is maximized when tokens are deterministic functions of pseudorandom numbers. Using this measure, we fully characterize the trade-off as a constrained optimization problem and derive explicit Pareto curves for two existing watermarking schemes. Finally, we introduce a principled mechanism that injects pseudorandomness into draft-token acceptance, ensuring maximal watermark strength while maintaining speculative sampling efficiency. Experiments further show that this approach improves detectability without sacrificing efficiency. Our findings uncover a principle that unites speculative sampling and watermarking, paving the way for their efficient and practical deployment.

Key Contributions

Introduces a quantitative measure of watermark strength governing statistical detectability, maximized when tokens are deterministic functions of pseudorandom numbers
Fully characterizes the watermark strength vs. speculative sampling efficiency trade-off as a constrained optimization problem with explicit Pareto curves for two existing schemes
Proposes a principled mechanism injecting pseudorandomness into draft-token acceptance, provably achieving maximal watermark strength while maintaining speculative sampling efficiency

🛡️ Threat Analysis

Output Integrity Attack

Proposes a watermarking mechanism embedded in LLM text outputs to trace provenance and verify content authenticity — output integrity / content watermarking. The paper explicitly targets statistical detectability of AI-generated text.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2026 1 cit.

Output Integrity Attack

100%