defense 2026

KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing

0 citations · 50 references · arXiv

Published on arXiv

2601.12986

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

KinGuard achieves superior effectiveness, stealth, and robustness against fine-tuning, input perturbation, and model merging compared to existing backdoor-based fingerprinting approaches.

KinGuard

Novel technique introduced

Protecting the intellectual property of large language models requires robust ownership verification. Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard, a framework that embeds a private knowledge corpus built on structured kinship narratives. Instead of memorizing superficial triggers, the model internalizes this knowledge via incremental pre-training, and ownership is verified by probing its conceptual understanding. Extensive experiments demonstrate KinGuard's superior effectiveness, stealth, and resilience against a battery of attacks including fine-tuning, input perturbation, and model merging. Our work establishes knowledge-based embedding as a practical and secure paradigm for model fingerprinting.

Key Contributions

Identifies and resolves the stealth-robustness paradox in backdoor-based fingerprinting by replacing fixed trigger memorization with naturalistic knowledge internalization.
Constructs a private kinship-narrative corpus and embeds it into LLM weights via incremental pre-training, enabling ownership verification through conceptual understanding probes.
Demonstrates resilience against fine-tuning, input perturbation, and model merging attacks while outperforming prior fingerprinting methods in stealth and effectiveness.

🛡️ Threat Analysis

Model Theft

KinGuard embeds a private knowledge corpus directly into model parameters via incremental pre-training and verifies ownership by probing conceptual understanding — this is model fingerprinting to prove ownership of a stolen LLM, the canonical ML05 defense use case.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxtraining_time

Applications

large language model ip protectionmodel ownership verification

Read PDF arXiv DOI Code

KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor

Antidistillation Fingerprinting

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing