CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models
Yu Zhang 1, Shuliang Liu 1, Xu Yang 2, Xuming Hu 1
Published on arXiv
2510.02342
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves 82.3% pass@1 on HumanEval and 100% AUROC on StackEval, outperforming entropy-threshold baselines across all cross-task benchmarks without sacrificing detection accuracy.
CATMark
Novel technique introduced
Watermarking algorithms for Large Language Models (LLMs) effectively identify machine-generated content by embedding and detecting hidden statistical features in text. However, such embedding leads to a decline in text quality, especially in low-entropy scenarios where performance needs improvement. Existing methods that rely on entropy thresholds often require significant computational resources for tuning and demonstrate poor adaptability to unknown or cross-task generation scenarios. We propose \textbf{C}ontext-\textbf{A}ware \textbf{T}hreshold watermarking ($\myalgo$), a novel framework that dynamically adjusts watermarking intensity based on real-time semantic context. $\myalgo$ partitions text generation into semantic states using logits clustering, establishing context-aware entropy thresholds that preserve fidelity in structured content while embedding robust watermarks. Crucially, it requires no pre-defined thresholds or task-specific tuning. Experiments show $\myalgo$ improves text quality in cross-tasks without sacrificing detection accuracy.
Key Contributions
- First systematic investigation of watermarking in cross-task (mixed-modality) generation scenarios such as interleaved code and natural language
- Dynamic thresholding mechanism that clusters tokens via KL divergence from learned prototypes and auto-computes per-context entropy thresholds without manual tuning
- Theoretical lower bound on detection z-score under adaptive thresholding, with empirical results showing 82.3% pass@1 on HumanEval and 100% AUROC on StackEval
🛡️ Threat Analysis
CATMark embeds imperceptible statistical watermarks into LLM text outputs to enable AI-generated content detection and content provenance tracking. The watermark is in the generated text (outputs), not in model weights, making this a classic ML09 output integrity / content watermarking contribution.