Towards Anytime-Valid Statistical Watermarking

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach for selecting sampling distributions and the reliance on fixed-horizon hypothesis testing, which precludes valid early stopping. In this paper, we bridge this gap by developing the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference. Unlike traditional approaches where optional stopping invalidates Type-I error guarantees, our framework enables valid, anytime-inference by constructing a test supermartingale for the detection process. By leveraging an anchor distribution to approximate the target model, we characterize the optimal e-value with respect to the worst-case log-growth rate and derive the optimal expected stopping time. Our theoretical claims are substantiated by simulations and evaluations on established benchmarks, showing that our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.

Key Contributions

First e-value-based watermarking framework ('Anchored E-Watermarking') that enables anytime-valid inference via a test supermartingale, allowing valid early stopping without inflating Type-I error
Principled characterization of the optimal e-value with respect to worst-case log-growth rate using an anchor distribution to approximate the target LLM
Derivation of optimal expected stopping time and empirical validation showing 13–15% reduction in average token budget for detection versus state-of-the-art baselines

🛡️ Threat Analysis

Output Integrity Attack

Embeds watermarks in LLM text outputs to verify content provenance and distinguish machine-generated from human text — this is output integrity / content authentication, not model IP protection (ML05). The watermark lives in the generated tokens, not model weights.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

established NLP benchmarks (unspecified in abstract/body excerpt)

Applications

2025 0 cit.

Output Integrity Attack

100%