defense 2025

Model Correlation Detection via Random Selection Probing

Ruibo Chen 1, Sheng Zhang 1, Yihan Wu 1, Tong Zheng 1, Peihua Mai 2, Heng Huang 1

1 citations · 34 references · arXiv

α

Published on arXiv

2509.24171

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

RSP consistently yields small p-values for fine-tuned or identical model pairs while maintaining high p-values for unrelated models across both LLMs and VLMs under black-box and grey-box conditions.

Random Selection Probing (RSP)

Novel technique introduced


The growing prevalence of large language models (LLMs) and vision-language models (VLMs) has heightened the need for reliable techniques to determine whether a model has been fine-tuned from or is even identical to another. Existing similarity-based methods often require access to model parameters or produce heuristic scores without principled thresholds, limiting their applicability. We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test. RSP optimizes textual or visual prefixes on a reference model for a random selection task and evaluates their transferability to a target model, producing rigorous p-values that quantify evidence of correlation. To mitigate false positives, RSP incorporates an unrelated baseline model to filter out generic, transferable features. We evaluate RSP across both LLMs and VLMs under diverse access conditions for reference models and test models. Experiments on fine-tuned and open-source models show that RSP consistently yields small p-values for related models while maintaining high p-values for unrelated ones. Extensive ablation studies further demonstrate the robustness of RSP. These results establish RSP as the first principled and general statistical framework for model correlation detection, enabling transparent and interpretable decisions in modern machine learning ecosystems.


Key Contributions

  • First principled hypothesis-testing framework for model correlation detection, outputting statistically rigorous p-values instead of heuristic similarity scores
  • Random selection probing task with optimization methods for textual and visual prefixes under gradient-accessible, logits-accessible, and black-box access conditions
  • Baseline-model filtering mechanism to suppress false positives caused by generic, transferable prefixes

🛡️ Threat Analysis

Model Theft

RSP is a model fingerprinting defense that detects whether a target model is a fine-tuned derivative or clone of a reference model — directly enabling intellectual property protection against model theft without requiring access to model weights.


Details

Domains
nlpvisionmultimodal
Model Types
llmvlmtransformer
Threat Tags
black_boxgrey_boxinference_time
Applications
model ip protectionmodel lineage detectionfine-tuned model attribution