Optimizing Token Choice for Code Watermarking: An RL Approach
Zhimeng Guo , Huaisheng Zhu , Siyuan Xu , Hangfan Zhang , Teng Xiao , Minhao Cheng
Published on arXiv
2508.11925
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
CodeTracer significantly outperforms state-of-the-art baselines in both watermark detectability and preservation of generated code functionality across comparative evaluations
CodeTracer
Novel technique introduced
Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality.
Key Contributions
- CodeTracer: an RL-based adaptive watermarking framework that uses a parameterized policy to intelligently bias token choices during LLM code generation
- Comprehensive reward system combining execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards
- Gumbel Top-k reparameterization to enable gradient-based optimization of otherwise discrete watermarking decisions
🛡️ Threat Analysis
CodeTracer watermarks LLM-generated code at the output token level to trace provenance and protect intellectual property — the watermark is embedded in generated content (outputs), not in model weights, making this output integrity/content watermarking rather than model ownership protection (ML05).