Model Correlation Detection via Random Selection Probing

The growing prevalence of large language models (LLMs) and vision-language models (VLMs) has heightened the need for reliable techniques to determine whether a model has been fine-tuned from or is even identical to another. Existing similarity-based methods often require access to model parameters or produce heuristic scores without principled thresholds, limiting their applicability. We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test. RSP optimizes textual or visual prefixes on a reference model for a random selection task and evaluates their transferability to a target model, producing rigorous p-values that quantify evidence of correlation. To mitigate false positives, RSP incorporates an unrelated baseline model to filter out generic, transferable features. We evaluate RSP across both LLMs and VLMs under diverse access conditions for reference models and test models. Experiments on fine-tuned and open-source models show that RSP consistently yields small p-values for related models while maintaining high p-values for unrelated ones. Extensive ablation studies further demonstrate the robustness of RSP. These results establish RSP as the first principled and general statistical framework for model correlation detection, enabling transparent and interpretable decisions in modern machine learning ecosystems.

Key Contributions

First principled hypothesis-testing framework for model correlation detection, outputting statistically rigorous p-values instead of heuristic similarity scores
Random selection probing task with optimization methods for textual and visual prefixes under gradient-accessible, logits-accessible, and black-box access conditions
Baseline-model filtering mechanism to suppress false positives caused by generic, transferable prefixes

🛡️ Threat Analysis

Model Theft

RSP is a model fingerprinting defense that detects whether a target model is a fine-tuned derivative or clone of a reference model — directly enabling intellectual property protection against model theft without requiring access to model weights.

Details

Domains

nlpvisionmultimodal

Model Types

llmvlmtransformer

Threat Tags

black_boxgrey_boxinference_time

Applications

2025 0 cit.

Model Theft

68%