A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives
Kaixiang Zhao 1, Lincan Li 2, Kaize Ding 3, Neil Zhenqiang Gong 4, Yue Zhao 5, Yushun Dong 2
Published on arXiv
2508.15031
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Identifies the utility-security trade-off as the central unsolved challenge in defending against model extraction, and proposes a novel taxonomy spanning attack mechanisms, defenses, and computing paradigms.
Machine learning (ML) models have significantly grown in complexity and utility, driving advances across multiple domains. However, substantial computational resources and specialized expertise have historically restricted their wide adoption. Machine-Learning-as-a-Service (MLaaS) platforms have addressed these barriers by providing scalable, convenient, and affordable access to sophisticated ML models through user-friendly APIs. While this accessibility promotes widespread use of advanced ML capabilities, it also introduces vulnerabilities exploited through Model Extraction Attacks (MEAs). Recent studies have demonstrated that adversaries can systematically replicate a target model's functionality by interacting with publicly exposed interfaces, posing threats to intellectual property, privacy, and system security. In this paper, we offer a comprehensive survey of MEAs and corresponding defense strategies. We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments. Our analysis covers various attack techniques, evaluates their effectiveness, and highlights challenges faced by existing defenses, particularly the critical trade-off between preserving model utility and ensuring security. We further assess MEAs within different computing paradigms and discuss their technical, ethical, legal, and societal implications, along with promising directions for future research. This systematic survey aims to serve as a valuable reference for researchers, practitioners, and policymakers engaged in AI security and privacy. Additionally, we maintain an online repository continuously updated with related literature at https://github.com/kzhao5/ModelExtractionPapers.
Key Contributions
- Comprehensive taxonomy classifying MEAs by attack mechanisms, defense approaches, and computing environments (including MLaaS, federated, and edge paradigms)
- Systematic analysis of attack-defense trade-offs, particularly the tension between model utility preservation and security
- Discussion of technical, ethical, legal, and societal implications of MEAs, plus a continuously updated online literature repository
🛡️ Threat Analysis
The paper's entire focus is on Model Extraction Attacks — adversaries querying ML APIs to clone model functionality — which is the canonical model theft threat. Defenses surveyed include watermarking for ownership verification and anti-extraction techniques, all squarely within ML05.