StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
Yixu Wang 1,2, Yan Teng 2, Yingchun Wang 2, Xingjun Ma 1
Published on arXiv
2509.23594
Model Theft
OWASP ML Top 10 — ML05
Key Finding
StolenLoRA achieves up to 96.60% attack success rate against LoRA-adapted vision models using only 10k queries, even when attacker and victim use different pre-trained backbones.
StolenLoRA
Novel technique introduced
Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have transformed vision model adaptation, enabling the rapid deployment of customized models. However, the compactness of LoRA adaptations introduces new safety concerns, particularly their vulnerability to model extraction attacks. This paper introduces a new focus of model extraction attacks named LoRA extraction that extracts LoRA-adaptive models based on a public pre-trained model. We then propose a novel extraction method called StolenLoRA which trains a substitute model to extract the functionality of a LoRA-adapted model using synthetic data. StolenLoRA leverages a Large Language Model to craft effective prompts for data generation, and it incorporates a Disagreement-based Semi-supervised Learning (DSL) strategy to maximize information gain from limited queries. Our experiments demonstrate the effectiveness of StolenLoRA, achieving up to a 96.60% attack success rate with only 10k queries, even in cross-backbone scenarios where the attacker and victim models utilize different pre-trained backbones. These findings reveal the specific vulnerability of LoRA-adapted models to this type of extraction and underscore the urgent need for robust defense mechanisms tailored to PEFT methods. We also explore a preliminary defense strategy based on diversified LoRA deployments, highlighting its potential to mitigate such attacks.
Key Contributions
- Introduces LoRA extraction as a distinct attack surface, framing model extraction attacks specifically against PEFT-adapted vision models that share a public pre-trained backbone with the attacker.
- Proposes StolenLoRA, which uses an LLM to generate effective synthetic query prompts and a Disagreement-based Semi-supervised Learning (DSL) strategy to maximize information gain from limited queries.
- Demonstrates 96.60% attack success rate with only 10k queries, including cross-backbone scenarios, and evaluates a preliminary defense via diversified LoRA deployments.
🛡️ Threat Analysis
StolenLoRA is a model extraction attack: the adversary queries a LoRA-adapted victim model as a black-box oracle, trains a substitute model on synthetic data, and clones the victim's learned functionality — directly targeting model intellectual property theft. The LoRA-specific attack surface (compactness, reliance on a known public pre-trained backbone) is a novel angle on model extraction, not a backdoor or supply-chain issue.