CODE ACROSTIC: Robust Watermarking for Code Generation

Watermarking large language models (LLMs) is vital for preventing their misuse, including the fabrication of fake news, plagiarism, and spam. It is especially important to watermark LLM-generated code, as it often contains intellectual property.However, we found that existing methods for watermarking LLM-generated code fail to address comment removal attack.In such cases, an attacker can simply remove the comments from the generated code without affecting its functionality, significantly reducing the effectiveness of current code-watermarking techniques.On the other hand, injecting a watermark into code is challenging because, as previous works have noted, most code represents a low-entropy scenario compared to natural language. Our approach to addressing this issue involves leveraging prior knowledge to distinguish between low-entropy and high-entropy parts of the code, as indicated by a Cue List of words.We then inject the watermark guided by this Cue List, achieving higher detectability and usability than existing methods.We evaluated our proposed method on HumanEvaland compared our method with three state-of-the-art code watermarking techniques. The results demonstrate the effectiveness of our approach.

Key Contributions

Identifies the comment removal attack as an overlooked and effective threat against existing LLM code watermarking methods (KGW, SWEET, EWD)
Proposes Code Acrostic, a Cue List-guided sparse watermarking technique that injects marks only after high-entropy tokens, bypassing low-entropy reserved keywords and comments
Experimentally demonstrates superior detectability and robustness on HumanEval compared to three state-of-the-art code watermarking baselines

🛡️ Threat Analysis

Output Integrity Attack

Embeds watermarks into LLM-generated code outputs to verify provenance and detect AI-generated code; also characterizes a comment removal attack that defeats existing code watermarks — both the defense (watermarking LLM outputs) and the attack (removing content watermarks) are squarely ML09.

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

HumanEval

Applications

2025 0 cit.

Output Integrity Attack

90%

CODE ACROSTIC: Robust Watermarking for Code Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection

Unforgeable Watermarks for Language Models via Robust Signatures

Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs

Improved Pseudorandom Codes from Permuted Puzzles

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

AgentMark: Utility-Preserving Behavioral Watermarking for Agents

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

AI-Generated Text is Non-Stationary: Detection via Temporal Tomography