survey 2025

A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

Kaixiang Zhao 1, Lincan Li 2, Kaize Ding 3, Neil Zhenqiang Gong 4, Yue Zhao 5, Yushun Dong 2

0 citations

α

Published on arXiv

2508.15031

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Identifies the utility-security trade-off as the central unsolved challenge in defending against model extraction, and proposes a novel taxonomy spanning attack mechanisms, defenses, and computing paradigms.


Machine learning (ML) models have significantly grown in complexity and utility, driving advances across multiple domains. However, substantial computational resources and specialized expertise have historically restricted their wide adoption. Machine-Learning-as-a-Service (MLaaS) platforms have addressed these barriers by providing scalable, convenient, and affordable access to sophisticated ML models through user-friendly APIs. While this accessibility promotes widespread use of advanced ML capabilities, it also introduces vulnerabilities exploited through Model Extraction Attacks (MEAs). Recent studies have demonstrated that adversaries can systematically replicate a target model's functionality by interacting with publicly exposed interfaces, posing threats to intellectual property, privacy, and system security. In this paper, we offer a comprehensive survey of MEAs and corresponding defense strategies. We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments. Our analysis covers various attack techniques, evaluates their effectiveness, and highlights challenges faced by existing defenses, particularly the critical trade-off between preserving model utility and ensuring security. We further assess MEAs within different computing paradigms and discuss their technical, ethical, legal, and societal implications, along with promising directions for future research. This systematic survey aims to serve as a valuable reference for researchers, practitioners, and policymakers engaged in AI security and privacy. Additionally, we maintain an online repository continuously updated with related literature at https://github.com/kzhao5/ModelExtractionPapers.


Key Contributions

  • Comprehensive taxonomy classifying MEAs by attack mechanisms, defense approaches, and computing environments (including MLaaS, federated, and edge paradigms)
  • Systematic analysis of attack-defense trade-offs, particularly the tension between model utility preservation and security
  • Discussion of technical, ethical, legal, and societal implications of MEAs, plus a continuously updated online literature repository

🛡️ Threat Analysis

Model Theft

The paper's entire focus is on Model Extraction Attacks — adversaries querying ML APIs to clone model functionality — which is the canonical model theft threat. Defenses surveyed include watermarking for ownership verification and anti-extraction techniques, all squarely within ML05.


Details

Domains
visionnlptabular
Model Types
cnntransformerllmtraditional_ml
Threat Tags
black_boxinference_time
Applications
mlaas apisimage classificationnatural language processinghealthcare aifinancial ai