KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing
Zhenhua Xu 1, Xiaoning Tian 2,3, Wenjun Zeng 2,4, Wenpeng Xing 1,2, Tianliang Lu 5, Gaolei Li 6, Chaochao Chen 1, Meng Han 1,2
Published on arXiv
2601.12986
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
KinGuard achieves superior effectiveness, stealth, and robustness against fine-tuning, input perturbation, and model merging compared to existing backdoor-based fingerprinting approaches.
KinGuard
Novel technique introduced
Protecting the intellectual property of large language models requires robust ownership verification. Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard, a framework that embeds a private knowledge corpus built on structured kinship narratives. Instead of memorizing superficial triggers, the model internalizes this knowledge via incremental pre-training, and ownership is verified by probing its conceptual understanding. Extensive experiments demonstrate KinGuard's superior effectiveness, stealth, and resilience against a battery of attacks including fine-tuning, input perturbation, and model merging. Our work establishes knowledge-based embedding as a practical and secure paradigm for model fingerprinting.
Key Contributions
- Identifies and resolves the stealth-robustness paradox in backdoor-based fingerprinting by replacing fixed trigger memorization with naturalistic knowledge internalization.
- Constructs a private kinship-narrative corpus and embeds it into LLM weights via incremental pre-training, enabling ownership verification through conceptual understanding probes.
- Demonstrates resilience against fine-tuning, input perturbation, and model merging attacks while outperforming prior fingerprinting methods in stealth and effectiveness.
🛡️ Threat Analysis
KinGuard embeds a private knowledge corpus directly into model parameters via incremental pre-training and verifies ownership by probing conceptual understanding — this is model fingerprinting to prove ownership of a stolen LLM, the canonical ML05 defense use case.