Model Correlation Detection via Random Selection Probing
Ruibo Chen 1, Sheng Zhang 1, Yihan Wu 1, Tong Zheng 1, Peihua Mai 2, Heng Huang 1
Published on arXiv
2509.24171
Model Theft
OWASP ML Top 10 — ML05
Model Theft
OWASP LLM Top 10 — LLM10
Key Finding
RSP consistently yields small p-values for fine-tuned or identical model pairs while maintaining high p-values for unrelated models across both LLMs and VLMs under black-box and grey-box conditions.
Random Selection Probing (RSP)
Novel technique introduced
The growing prevalence of large language models (LLMs) and vision-language models (VLMs) has heightened the need for reliable techniques to determine whether a model has been fine-tuned from or is even identical to another. Existing similarity-based methods often require access to model parameters or produce heuristic scores without principled thresholds, limiting their applicability. We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test. RSP optimizes textual or visual prefixes on a reference model for a random selection task and evaluates their transferability to a target model, producing rigorous p-values that quantify evidence of correlation. To mitigate false positives, RSP incorporates an unrelated baseline model to filter out generic, transferable features. We evaluate RSP across both LLMs and VLMs under diverse access conditions for reference models and test models. Experiments on fine-tuned and open-source models show that RSP consistently yields small p-values for related models while maintaining high p-values for unrelated ones. Extensive ablation studies further demonstrate the robustness of RSP. These results establish RSP as the first principled and general statistical framework for model correlation detection, enabling transparent and interpretable decisions in modern machine learning ecosystems.
Key Contributions
- First principled hypothesis-testing framework for model correlation detection, outputting statistically rigorous p-values instead of heuristic similarity scores
- Random selection probing task with optimization methods for textual and visual prefixes under gradient-accessible, logits-accessible, and black-box access conditions
- Baseline-model filtering mechanism to suppress false positives caused by generic, transferable prefixes
🛡️ Threat Analysis
RSP is a model fingerprinting defense that detects whether a target model is a fine-tuned derivative or clone of a reference model — directly enabling intellectual property protection against model theft without requiring access to model weights.