defense 2026

KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing

Zhenhua Xu 1, Xiaoning Tian 2,3, Wenjun Zeng 2,4, Wenpeng Xing 1,2, Tianliang Lu 5, Gaolei Li 6, Chaochao Chen 1, Meng Han 1,2

0 citations · 50 references · arXiv

α

Published on arXiv

2601.12986

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

KinGuard achieves superior effectiveness, stealth, and robustness against fine-tuning, input perturbation, and model merging compared to existing backdoor-based fingerprinting approaches.

KinGuard

Novel technique introduced


Protecting the intellectual property of large language models requires robust ownership verification. Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard, a framework that embeds a private knowledge corpus built on structured kinship narratives. Instead of memorizing superficial triggers, the model internalizes this knowledge via incremental pre-training, and ownership is verified by probing its conceptual understanding. Extensive experiments demonstrate KinGuard's superior effectiveness, stealth, and resilience against a battery of attacks including fine-tuning, input perturbation, and model merging. Our work establishes knowledge-based embedding as a practical and secure paradigm for model fingerprinting.


Key Contributions

  • Identifies and resolves the stealth-robustness paradox in backdoor-based fingerprinting by replacing fixed trigger memorization with naturalistic knowledge internalization.
  • Constructs a private kinship-narrative corpus and embeds it into LLM weights via incremental pre-training, enabling ownership verification through conceptual understanding probes.
  • Demonstrates resilience against fine-tuning, input perturbation, and model merging attacks while outperforming prior fingerprinting methods in stealth and effectiveness.

🛡️ Threat Analysis

Model Theft

KinGuard embeds a private knowledge corpus directly into model parameters via incremental pre-training and verifies ownership by probing conceptual understanding — this is model fingerprinting to prove ownership of a stolen LLM, the canonical ML05 defense use case.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxtraining_time
Applications
large language model ip protectionmodel ownership verification