CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models

Watermarking algorithms for Large Language Models (LLMs) effectively identify machine-generated content by embedding and detecting hidden statistical features in text. However, such embedding leads to a decline in text quality, especially in low-entropy scenarios where performance needs improvement. Existing methods that rely on entropy thresholds often require significant computational resources for tuning and demonstrate poor adaptability to unknown or cross-task generation scenarios. We propose \textbf{C}ontext-\textbf{A}ware \textbf{T}hreshold watermarking ($\myalgo$), a novel framework that dynamically adjusts watermarking intensity based on real-time semantic context. $\myalgo$ partitions text generation into semantic states using logits clustering, establishing context-aware entropy thresholds that preserve fidelity in structured content while embedding robust watermarks. Crucially, it requires no pre-defined thresholds or task-specific tuning. Experiments show $\myalgo$ improves text quality in cross-tasks without sacrificing detection accuracy.

Key Contributions

First systematic investigation of watermarking in cross-task (mixed-modality) generation scenarios such as interleaved code and natural language
Dynamic thresholding mechanism that clusters tokens via KL divergence from learned prototypes and auto-computes per-context entropy thresholds without manual tuning
Theoretical lower bound on detection z-score under adaptive thresholding, with empirical results showing 82.3% pass@1 on HumanEval and 100% AUROC on StackEval

🛡️ Threat Analysis

Output Integrity Attack

CATMark embeds imperceptible statistical watermarks into LLM text outputs to enable AI-generated content detection and content provenance tracking. The watermark is in the generated text (outputs), not in model weights, making this a classic ML09 output integrity / content watermarking contribution.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

HumanEvalStackEval

Applications

2026 0 cit.

Output Integrity Attack

100%

CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective

MGT-Prism: Enhancing Domain Generalization for Machine-Generated Text Detection via Spectral Alignment

More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles

An Ensemble Framework for Unbiased Language Model Watermarking

ArcMark: Multi-bit LLM Watermark via Optimal Transport

A Unified Framework for LLM Watermarks

Span-level Detection of AI-generated Scientific Text via Contrastive Learning and Structural Calibration

MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages