ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models
Zhenhua Xu 1, Haobo Zhang 2,3, Zhebo Wang 1,2, Qichen Liu 2, Haitao Xu 1, Wenpeng Xing 1,2, Meng Han 1,2
Published on arXiv
2601.08189
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Achieves 100% ownership verification on fingerprinted LLMs while surpassing backdoor-based fingerprinting baselines in stealthiness, robustness to model merging, and resistance to incremental fine-tuning
ForgetMark
Novel technique introduced
Existing invasive (backdoor) fingerprints suffer from high-perplexity triggers that are easily filtered, fixed response patterns exposed by heuristic detectors, and spurious activations on benign inputs. We introduce \textsc{ForgetMark}, a stealthy fingerprinting framework that encodes provenance via targeted unlearning. It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities. Ownership is verified under black/gray-box access by aggregating likelihood and semantic evidence into a fingerprint success rate. By relying on probabilistic forgetting traces rather than fixed trigger--response patterns, \textsc{ForgetMark} avoids high-perplexity triggers, reduces detectability, and lowers false triggers. Across diverse architectures and settings, it achieves 100\% ownership verification on fingerprinted models while maintaining standard performance, surpasses backdoor baselines in stealthiness and robustness to model merging, and remains effective under moderate incremental fine-tuning. Our code and data are available at \href{https://github.com/Xuzhenhua55/ForgetMark}{https://github.com/Xuzhenhua55/ForgetMark}.
Key Contributions
- Novel fingerprinting framework using targeted unlearning (via LoRA adapters) to suppress specific key-value behaviors, encoding model provenance as probabilistic forgetting traces rather than fixed trigger-response patterns
- Entropy-guided key-value construction using an assistant model and predictive-entropy ranking to select high-determinacy, low-perplexity fingerprint pairs that reduce false triggers
- Ownership verification protocol aggregating likelihood and semantic evidence under black/gray-box access, achieving 100% verification rate with improved stealthiness and robustness to model merging
🛡️ Threat Analysis
ForgetMark embeds fingerprints IN THE MODEL (via LoRA adapters that suppress specific key-value behaviors) to prove model ownership under black/gray-box access — this is a model IP protection technique defending against model theft and unauthorized redistribution.