Large Language Models Are Effective Code Watermarkers
Rui Xu 1, Jiawei Chen 1, Zhaoxia Yin 1, Cong Kong 1, Xinpeng Zhang 2
Published on arXiv
2510.11251
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
CodeMark-LLM achieves nearly 100% syntactic correctness and unit test pass rate across C, C++, Java, and JavaScript while maintaining robustness against common code obfuscation and reformatting attacks.
CodeMark-LLM
Novel technique introduced
The widespread use of large language models (LLMs) and open-source code has raised ethical and security concerns regarding the distribution and attribution of source code, including unauthorized redistribution, license violations, and misuse of code for malicious purposes. Watermarking has emerged as a promising solution for source attribution, but existing techniques rely heavily on hand-crafted transformation rules, abstract syntax tree (AST) manipulation, or task-specific training, limiting their scalability and generality across languages. Moreover, their robustness against attacks remains limited. To address these limitations, we propose CodeMark-LLM, an LLM-driven watermarking framework that embeds watermark into source code without compromising its semantics or readability. CodeMark-LLM consists of two core components: (i) Semantically Consistent Embedding module that applies functionality-preserving transformations to encode watermark bits, and (ii) Differential Comparison Extraction module that identifies the applied transformations by comparing the original and watermarked code. Leveraging the cross-lingual generalization ability of LLM, CodeMark-LLM avoids language-specific engineering and training pipelines. Extensive experiments across diverse programming languages and attack scenarios demonstrate its robustness, effectiveness, and scalability.
Key Contributions
- CodeMark-LLM: a training-free, parser-independent, language-agnostic LLM-driven code watermarking framework that embeds provenance signals via semantically preserving code transformations
- Semantically Consistent Embedding module using prompt-driven LLMs to automatically generate functionality-preserving transformations without handcrafted rules or AST parsing
- Differential Comparison Extraction module that decodes watermarks by multi-granularity comparison between original and watermarked code, robust to obfuscation and reformatting attacks
🛡️ Threat Analysis
CodeMark-LLM watermarks source code CONTENT (the artifact) to enable provenance tracking and attribution — this is content watermarking analogous to LLM text watermarking. The paper evaluates robustness against watermark removal/obfuscation attacks, which are output integrity attacks on content provenance schemes.