DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design
Yuchen Chen , Yuan Xiao , Chunrong Fang , Zhenyu Chen , Baowen Xu
Published on arXiv
2604.10611
Output Integrity Attack
OWASP ML Top 10 — ML09
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Achieves strong verifiability (p < 0.05), high stealthiness (suspicious rate ≤ 0.36), robustness against watermark removal (recall ≤ 0.57), and 28.6% Pass@1 drop upon watermark removal across 72 experimental settings
DuCodeMark
Novel technique introduced
The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source code datasets. While watermarking provides a viable protection mechanism by embedding ownership signals, existing methods rely on detectable trigger-target patterns and are limited to source-code tasks, overlooking other scenarios such as decompilation tasks. In this paper, we propose DuCodeMark, a stealthy and robust dual-purpose watermarking method for code datasets that generalizes across both source-code tasks and decompilation tasks. DuCodeMark parses each code sample into an abstract syntax tree (AST), applies language-specific style transformations to construct stealthy trigger-target pairs, and injects repressible poisoned features into a subset of return-typed samples to enhance robustness against watermark removal or evasion. These features remain inactive during normal training but are activated upon watermark removal, degrading model performance. For verification, DuCodeMark employs a black-box method based on the independent-samples $t$-test. We conduct a comprehensive evaluation of DuCodeMark across 72 settings spanning two code tasks, two programming languages, three CodeLMs, and six decoding temperatures. The results demonstrate that it consistently achieves strong verifiability ($p < 0.05$), high stealthiness (suspicious rate $\leq$ 0.36), robustness against both watermark and poisoning attacks (recall $\leq$ 0.57), and a substantial drop in model performance upon watermark removal (Pass@1 drops by 28.6%), underscoring its practicality and resilience.
Key Contributions
- Dual-purpose watermarking method for code datasets using AST-based style transformations to create stealthy trigger-target pairs
- Repressible poisoned features that remain inactive during normal training but degrade model performance upon watermark removal attempts
- Black-box verification method using independent-samples t-test, generalizing across source-code and decompilation tasks
🛡️ Threat Analysis
Primary contribution is watermarking TRAINING DATA (code datasets) to prove ownership and trace provenance. The watermark is embedded in the dataset via trigger-target pairs, not in model weights. This is data provenance/integrity protection, fitting ML09's scope of output/content integrity and authenticity verification.
The method embeds backdoor triggers (style-transformed code samples) in the training data that cause models to produce specific outputs when triggered. Additionally, it injects 'repressible poisoned features' that activate upon watermark removal to degrade model performance. These are classic backdoor/trojan mechanisms—hidden behaviors activated by specific triggers.