defense 2026

Provable Model Provenance Set for Large Language Models

Xiaoqi Qiu , Hao Zeng , Zhiyu Hou , Hongxin Wei

0 citations · 33 references · arXiv

α

Published on arXiv

2602.00772

Model Theft

OWASP ML Top 10 — ML05

Key Finding

MPS reliably achieves target provenance coverage while strictly limiting inclusion of unrelated models, outperforming heuristic fingerprint-matching baselines on binary provenance verification

Model Provenance Set (MPS)

Novel technique introduced


The growing prevalence of unauthorized model usage and misattribution has increased the need for reliable model provenance analysis. However, existing methods largely rely on heuristic fingerprint-matching rules that lack provable error control and often overlook the existence of multiple sources, leaving the reliability of their provenance claims unverified. In this work, we first formalize the model provenance problem with provable guarantees, requiring rigorous coverage of all true provenances at a prescribed confidence level. Then, we propose the Model Provenance Set (MPS), which employs a sequential test-and-exclusion procedure to adaptively construct a small set satisfying the guarantee. The key idea of MPS is to test the significance of provenance existence within a candidate pool, thereby establishing a provable asymptotic guarantee at a user-specific confidence level. Extensive experiments demonstrate that MPS effectively achieves target provenance coverage while strictly limiting the inclusion of unrelated models, and further reveal its potential for practical provenance analysis in attribution and auditing tasks.


Key Contributions

  • Formalizes model provenance as a statistical testing problem requiring provable coverage of all true source models at a user-specified confidence level
  • Proposes Model Provenance Set (MPS): a sequential test-and-exclusion procedure that constructs a compact candidate set with asymptotic provenance coverage guarantees
  • Validates practical utility for LLM attribution, unauthorized derivation screening, and non-infringement quantification across 455 LLMs spanning up to three generations of fine-tuning lineage

🛡️ Threat Analysis

Model Theft

Proposes model fingerprinting and provenance analysis specifically to detect unauthorized LLM derivation — a defense against model theft where fine-tuned derivatives are misattributed as independently developed. The method identifies which base models a suspected model was derived from, with provable coverage guarantees, directly serving IP protection and auditing goals.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxwhite_boxinference_time
Datasets
HuggingFace LLMs (135M–3B parameters, 455 models, up to 3-generation lineage)
Applications
llm attributionunauthorized model derivation detectionmodel ip protectionmodel auditing