defense 2026

Attesting Model Lineage by Consisted Knowledge Evolution with Fine-Tuning Trajectory

Zhuoyi Shang ^1,2,3, Jiasen Li ^1,2,3, Pengzhen Chen ^1,2,3, Yanwei Liu ^1,3, Xiaoyan Gu ^1,2,3, Weiping Wang ¹

¹ Chinese Academy of Sciences

² University of Chinese Academy of Sciences

³ State Key Laboratory of Cyberspace Security Defense

1 citations · 52 references · arXiv

Published on arXiv

2601.11683

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Achieves reliable lineage verification across classifiers, diffusion models, and LLMs under a variety of adversarial scenarios in open-weight model settings

Knowledge Evolution Lineage Attestation

Novel technique introduced

The fine-tuning technique in deep learning gives rise to an emerging lineage relationship among models. This lineage provides a promising perspective for addressing security concerns such as unauthorized model redistribution and false claim of model provenance, which are particularly pressing in \textcolor{blue}{open-weight model} libraries where robust lineage verification mechanisms are often lacking. Existing approaches to model lineage detection primarily rely on static architectural similarities, which are insufficient to capture the dynamic evolution of knowledge that underlies true lineage relationships. Drawing inspiration from the genetic mechanism of human evolution, we tackle the problem of model lineage attestation by verifying the joint trajectory of knowledge evolution and parameter modification. To this end, we propose a novel model lineage attestation framework. In our framework, model editing is first leveraged to quantify parameter-level changes introduced by fine-tuning. Subsequently, we introduce a novel knowledge vectorization mechanism that refines the evolved knowledge within the edited models into compact representations by the assistance of probe samples. The probing strategies are adapted to different types of model families. These embeddings serve as the foundation for verifying the arithmetic consistency of knowledge relationships across models, thereby enabling robust attestation of model lineage. Extensive experimental evaluations demonstrate the effectiveness and resilience of our approach in a variety of adversarial scenarios in the real world. Our method consistently achieves reliable lineage verification across a broad spectrum of model types, including classifiers, diffusion models, and large language models.

Key Contributions

Model editing-based quantification of parameter-level changes introduced by fine-tuning to capture knowledge evolution trajectories
Knowledge vectorization mechanism that distills evolved model knowledge into compact embeddings via probe samples, adapted to different model families
Arithmetic consistency verification across model lineages enabling robust lineage attestation against adversarial spoofing across classifiers, diffusion models, and LLMs

🛡️ Threat Analysis

Model Theft

The framework defends model intellectual property by proving lineage ownership — detecting whether a released model is an unauthorized derivative of a proprietary base model. This is model fingerprinting/provenance verification to counter unauthorized model redistribution, a core ML05 defense concern.

Details

Domains

visionnlpgenerative

Model Types

cnndiffusionllmtransformer

Threat Tags

white_box

Applications

model ip protectionmodel provenance verificationopen-weight model repositories

Read PDF arXiv DOI

Attesting Model Lineage by Consisted Knowledge Evolution with Fine-Tuning Trajectory

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Defending Unauthorized Model Merging via Dual-Stage Weight Protection

Amulet: Fast TEE-Shielded Inference for On-Device Model Protection

Making Models Unmergeable via Scaling-Sensitive Loss Landscape

Knowledge Distillation Detection for Open-weights Models

Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing

Ghost in the Transformer: Detecting Model Reuse with Invariant Spectral Signatures

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation