defense 2026

ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models

Zhenhua Xu 1, Haobo Zhang 2,3, Zhebo Wang 1,2, Qichen Liu 2, Haitao Xu 1, Wenpeng Xing 1,2, Meng Han 1,2

2 citations · 52 references · arXiv

α

Published on arXiv

2601.08189

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Achieves 100% ownership verification on fingerprinted LLMs while surpassing backdoor-based fingerprinting baselines in stealthiness, robustness to model merging, and resistance to incremental fine-tuning

ForgetMark

Novel technique introduced


Existing invasive (backdoor) fingerprints suffer from high-perplexity triggers that are easily filtered, fixed response patterns exposed by heuristic detectors, and spurious activations on benign inputs. We introduce \textsc{ForgetMark}, a stealthy fingerprinting framework that encodes provenance via targeted unlearning. It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities. Ownership is verified under black/gray-box access by aggregating likelihood and semantic evidence into a fingerprint success rate. By relying on probabilistic forgetting traces rather than fixed trigger--response patterns, \textsc{ForgetMark} avoids high-perplexity triggers, reduces detectability, and lowers false triggers. Across diverse architectures and settings, it achieves 100\% ownership verification on fingerprinted models while maintaining standard performance, surpasses backdoor baselines in stealthiness and robustness to model merging, and remains effective under moderate incremental fine-tuning. Our code and data are available at \href{https://github.com/Xuzhenhua55/ForgetMark}{https://github.com/Xuzhenhua55/ForgetMark}.


Key Contributions

  • Novel fingerprinting framework using targeted unlearning (via LoRA adapters) to suppress specific key-value behaviors, encoding model provenance as probabilistic forgetting traces rather than fixed trigger-response patterns
  • Entropy-guided key-value construction using an assistant model and predictive-entropy ranking to select high-determinacy, low-perplexity fingerprint pairs that reduce false triggers
  • Ownership verification protocol aggregating likelihood and semantic evidence under black/gray-box access, achieving 100% verification rate with improved stealthiness and robustness to model merging

🛡️ Threat Analysis

Model Theft

ForgetMark embeds fingerprints IN THE MODEL (via LoRA adapters that suppress specific key-value behaviors) to prove model ownership under black/gray-box access — this is a model IP protection technique defending against model theft and unauthorized redistribution.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxgrey_boxtraining_time
Applications
llm copyright protectionmodel provenance auditingownership verification