ML Security Papers

Latest papers

165 papers

defense arXiv Mar 26, 2026 · 13d ago

LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization

Guang Yang, Ziye Geng, Yihang Chen et al. · Virginia Commonwealth University · University of Houston

Efficient model fingerprinting defense using checkpoint augmentation and local verifiers to detect stolen models across tasks

Model Theft visionnlpgraph

PDF

defense arXiv Mar 26, 2026 · 13d ago

IrisFP: Adversarial-Example-based Model Fingerprinting with Enhanced Uniqueness and Robustness

Ziye Geng, Guang Yang, Yihang Chen et al. · University of Houston · Virginia Commonwealth University

Adversarial fingerprinting method for model ownership verification using multi-boundary composite samples with enhanced uniqueness and robustness

Model Theft vision

PDF

defense arXiv Mar 25, 2026 · 14d ago

AMIF: Authorizable Medical Image Fusion Model with Built-in Authentication

Jie Song, Jun Jia, Wei Sun et al. · Macao Polytechnic University · Shanghai Jiao Tong University +2 more

Medical image fusion model embedding visible copyright watermarks in outputs, removable only with authentication keys

Model Theft Output Integrity Attack visionmultimodal

PDF

survey arXiv Mar 25, 2026 · 14d ago

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan · University of Central Florida · University of Copenhagen

Unified taxonomy of ML security threats organizing attacks into data-to-data, data-to-model, model-to-data, and model-to-model categories

Input Manipulation Attack Data Poisoning Attack Model Inversion Attack Membership Inference Attack Model Theft Output Integrity Attack Model Poisoning Prompt Injection Sensitive Information Disclosure visionnlpmultimodal

PDF

attack arXiv Mar 25, 2026 · 14d ago

How Vulnerable Are Edge LLMs?

Ao Ding, Hongzong Li, Zi Liang et al. · China University of Geosciences · Hong Kong University of Science and Technology +4 more

Query-based extraction attack on quantized edge LLMs using clustered instruction queries to steal model behavior efficiently

Model Theft Model Theft nlp

PDF

defense arXiv Mar 22, 2026 · 17d ago

Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach

Guang Yang, Ziye Geng, Yihang Chen et al. · Virginia Commonwealth University · University of Houston

Analytical fingerprinting defense that proves DNN ownership by controlling adversarial perturbation distance from decision boundaries

Model Theft vision

PDF

defense arXiv Mar 19, 2026 · 20d ago

Functional Subspace Watermarking for Large Language Models

Zikang Ding, Junhao Li, Suling Wu et al. · University of Electronic Science and Technology of China · Mohamed bin Zayed University of Artificial Intelligence +1 more

Embeds ownership watermarks in a low-dimensional functional subspace of LLM weights, surviving fine-tuning, quantization, and distillation attacks

Model Theft Model Theft nlp

PDF

defense arXiv Mar 13, 2026 · 26d ago

Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

Yanna Jiang, Guangsheng Yu, Qingyuan Yu et al. · University of Technology Sydney · Independent +2 more

Defeats Neural Structural Obfuscation attacks on model watermarks by canonicalizing neural networks to restore watermark verification

Model Theft vision

PDF Code

defense arXiv Mar 12, 2026 · 27d ago

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai et al. · Shanghai Jiao Tong University · Ant Group +2 more

Embeds per-client backdoor watermarks in federated LMs to trace model leaks to individual culprits via black-box queries

Model Theft Model Poisoning nlpfederated-learningmultimodal

PDF

defense LNCS Mar 11, 2026 · 28d ago

A PUF-Based Approach for Copy Protection of Intellectual Property in Neural Network Models

Daniel Dorfmeister, Flavio Ferrarotti, Bernhard Fischer et al. · Software Competence Center Hagenberg

Binds NN model weights to hardware fingerprints via PUFs, degrading accuracy on cloned hardware to prevent model theft

Model Theft vision

PDF

defense arXiv Mar 11, 2026 · 28d ago

RandMark: On Random Watermarking of Visual Foundation Models

Anna Chistyakova, Mikhail Pautov · RAS · AXXX

Embeds binary watermarks into VFM hidden representations to verify model ownership after fine-tuning or pruning

Model Theft vision

PDF

defense arXiv Mar 10, 2026 · 29d ago

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

Yinpeng Wu, Yitong Chen, Lixiang Wang et al. · Shanghai Jiao Tong University

TEE-based LLM serving system that protects model weights and user data from compromised OS kernels on mobile devices

Model Theft Sensitive Information Disclosure nlp

PDF

defense arXiv Mar 9, 2026 · 4w ago

Client-Cooperative Split Learning

Haiyu Deng, Yanna Jiang, Guangsheng Yu et al. · University of Technology Sydney · CSIRO Data61 +1 more

Defends split learning against activation inversion, label clustering, and model extraction via DP and chained watermarking

Model Inversion Attack Model Theft federated-learningvision

PDF

benchmark arXiv Mar 8, 2026 · 4w ago

DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation

Bo Jiang · Temple University

Systematically evaluates nine output-level defenses against LLM distillation theft, finding most fail except chain-of-thought removal for math

Model Theft Model Theft nlp

PDF

attack arXiv Mar 7, 2026 · 4w ago

How to Steal Reasoning Without Reasoning Traces

Tingwei Zhang, John X. Morris, Vitaly Shmatikov · Cornell Tech

Steals LLM reasoning capabilities by synthesizing hidden chains-of-thought from black-box answers and summaries alone

Model Theft Model Theft nlp

PDF

defense arXiv Mar 6, 2026 · 4w ago

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Donghwa Kang, Hojun Choe, Doohyun Kim et al. · Korea Advanced Institute of Science and Technology · University of Seoul

Defends edge-deployed DNNs against model theft via TEE partitioning and self-poisoning that renders the exposed backbone functionally incoherent

Model Theft vision

PDF

defense arXiv Mar 5, 2026 · 4w ago

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Lianyu Wang, Meng Wang, Huazhu Fu et al. · Nanjing University of Aeronautics and Astronautics · Southeast University +1 more

Defends VLM intellectual property via dynamic authorization module restricting deployment to user-specified domains at inference time

Model Theft visionnlpmultimodal

PDF

attack arXiv Mar 3, 2026 · 5w ago

Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field

Peter Horvath, Ilia Shumailov, Lukasz Chmielewski et al. · Radboud University · AI Security Company +2 more

Steals DNN and LLM weights from GPU Tensor Cores using electromagnetic side-channel attacks up to 100cm away

Model Theft visionnlp

PDF

defense arXiv Feb 27, 2026 · 5w ago

PDF: PUF-based DNN Fingerprinting for Knowledge Distillation Traceability

Ning Lyu, Yuntao Liu, Yonghong Bai et al.

Embeds hardware PUF signatures into knowledge distillation logits to trace stolen/cloned student models back to specific devices

Model Theft Transfer Learning Attack vision

PDF

defense arXiv Feb 23, 2026 · 6w ago

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

Bolin Shen, Md Shamim Seraj, Zhan Cheng et al. · Florida State University · University of Wisconsin

Defends GNN models against extraction attacks via decision boundary-aware signatures enabling ownership verification at both embedding and label levels

Model Theft graph

PDF Code

Loading more papers…

Latest papers

LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization

IrisFP: Adversarial-Example-based Model Fingerprinting with Enhanced Uniqueness and Robustness

AMIF: Authorizable Medical Image Fusion Model with Built-in Authentication

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

How Vulnerable Are Edge LLMs?

Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach

Functional Subspace Watermarking for Large Language Models

Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

A PUF-Based Approach for Copy Protection of Intellectual Property in Neural Network Models

RandMark: On Random Watermarking of Visual Foundation Models

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

Client-Cooperative Split Learning

DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation

How to Steal Reasoning Without Reasoning Traces

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field

PDF: PUF-based DNN Fingerprinting for Knowledge Distillation Traceability

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue