Provable Model Provenance Set for Large Language Models

The growing prevalence of unauthorized model usage and misattribution has increased the need for reliable model provenance analysis. However, existing methods largely rely on heuristic fingerprint-matching rules that lack provable error control and often overlook the existence of multiple sources, leaving the reliability of their provenance claims unverified. In this work, we first formalize the model provenance problem with provable guarantees, requiring rigorous coverage of all true provenances at a prescribed confidence level. Then, we propose the Model Provenance Set (MPS), which employs a sequential test-and-exclusion procedure to adaptively construct a small set satisfying the guarantee. The key idea of MPS is to test the significance of provenance existence within a candidate pool, thereby establishing a provable asymptotic guarantee at a user-specific confidence level. Extensive experiments demonstrate that MPS effectively achieves target provenance coverage while strictly limiting the inclusion of unrelated models, and further reveal its potential for practical provenance analysis in attribution and auditing tasks.

Key Contributions

Formalizes model provenance as a statistical testing problem requiring provable coverage of all true source models at a user-specified confidence level
Proposes Model Provenance Set (MPS): a sequential test-and-exclusion procedure that constructs a compact candidate set with asymptotic provenance coverage guarantees
Validates practical utility for LLM attribution, unauthorized derivation screening, and non-infringement quantification across 455 LLMs spanning up to three generations of fine-tuning lineage

🛡️ Threat Analysis

Model Theft

Proposes model fingerprinting and provenance analysis specifically to detect unauthorized LLM derivation — a defense against model theft where fine-tuned derivatives are misattributed as independently developed. The method identifies which base models a suspected model was derived from, with provable coverage guarantees, directly serving IP protection and auditing goals.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxwhite_boxinference_time

Datasets

HuggingFace LLMs (135M–3B parameters, 455 models, up to 3-generation lineage)

Applications

2025 0 cit.

Model Theft

71%