ML Security Papers

Latest papers

11 papers

attack arXiv Mar 19, 2026 · 18d ago

In-the-Wild Camouflage Attack on Vehicle Detectors through Controllable Image Editing

Xiao Fang, Yiming Gong, Stanislav Panev et al. · Carnegie Mellon University · DEVCOM Army Research Laboratory +1 more

Physical-world camouflage attack synthesizing adversarial vehicle textures via ControlNet fine-tuning, achieving 38% AP50 drop with transferability

Input Manipulation Attack vision

PDF Code

defense arXiv Feb 23, 2026 · 6w ago

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

Bolin Shen, Md Shamim Seraj, Zhan Cheng et al. · Florida State University · University of Wisconsin

Defends GNN models against extraction attacks via decision boundary-aware signatures enabling ownership verification at both embedding and label levels

Model Theft graph

PDF Code

defense arXiv Feb 23, 2026 · 6w ago

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

Bolin Shen, Zhan Cheng, Neil Zhenqiang Gong et al. · Florida State University · University of Wisconsin +2 more

Certifies DNN ownership against model extraction using mutual information similarity with theoretical verification guarantees

Model Theft visionnlp

PDF Code

benchmark arXiv Feb 10, 2026 · 7w ago

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Zhisheng Qi, Utkarsh Sahu, Li Ma et al. · University of Oregon · Michigan State University +6 more

First systematic benchmark comparing knowledge-extraction attacks and defenses on RAG systems under unified evaluation protocols

Sensitive Information Disclosure nlp

PDF Code

benchmark arXiv Dec 4, 2025 · Dec 2025

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

Jinbo Liu, Defu Cao, Yifei Wei et al. · University of Southern California · Florida State University +1 more

Benchmarks PII leakage in multi-agent LLM systems across six topologies, showing dense connectivity and proximity amplify adversarial memory extraction

Sensitive Information Disclosure nlp

1 citations 1 influentialPDF

attack arXiv Nov 14, 2025 · Nov 2025

A Systematic Study of Model Extraction Attacks on Graph Foundation Models

Haoyan Xu, Ruizhi Qian, Jiate Li et al. · University of Southern California · Florida State University +2 more

Systematically extracts Graph Foundation Models via black-box embedding regression, cloning victim models at 0.07% of original training cost

Model Theft graphmultimodal

PDF

Graph machine learning has advanced rapidly in tasks such as link prediction, anomaly detection, and node classification. As models scale up, pretrained graph models have become valuable intellectual assets because they encode extensive computation and domain expertise. Building on these advances, Graph Foundation Models (GFMs) mark a major step forward by jointly pretraining graph and text encoders on massive and diverse data. This unifies structural and semantic understanding, enables zero-shot inference, and supports applications such as fraud detection and biomedical analysis. However, the high pretraining cost and broad cross-domain knowledge in GFMs also make them attractive targets for model extraction attacks (MEAs). Prior work has focused only on small graph neural networks trained on a single graph, leaving the security implications for large-scale and multimodal GFMs largely unexplored. This paper presents the first systematic study of MEAs against GFMs. We formalize a black-box threat model and define six practical attack scenarios covering domain-level and graph-specific extraction goals, architectural mismatch, limited query budgets, partial node access, and training data discrepancies. To instantiate these attacks, we introduce a lightweight extraction method that trains an attacker encoder using supervised regression of graph embeddings. Even without contrastive pretraining data, this method learns an encoder that stays aligned with the victim text encoder and preserves its zero-shot inference ability on unseen graphs. Experiments on seven datasets show that the attacker can approximate the victim model using only a tiny fraction of its original training cost, with almost no loss in accuracy. These findings reveal that GFMs greatly expand the MEA surface and highlight the need for deployment-aware security defenses in large-scale graph learning systems.

gnn transformer multimodal University of Southern California · Florida State University · The Ohio State University +1 more

PDF arXiv DOI

defense arXiv Oct 27, 2025 · Oct 2025

PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

Jiaqi Xue, Yifei Zhao, Mansour Al Ghanim et al. · University of Central Florida · Florida State University +1 more

Embeds robust text watermarks into open-source LLM weights to detect AI-generated content even after fine-tuning or model merging

Output Integrity Attack nlp

PDF

defense arXiv Oct 24, 2025 · Oct 2025

DictPFL: Efficient and Private Federated Learning on Encrypted Gradients

Jiaqi Xue, Mayank Kumar, Yuzhang Shang et al. · University of Central Florida · Florida State University +2 more

Defends federated learning against gradient inversion attacks via efficient homomorphic encryption, achieving 2× overhead of plaintext FL

Model Inversion Attack federated-learning

1 citations PDF Code

survey arXiv Aug 27, 2025 · Aug 2025

Intellectual Property in Graph-Based Machine Learning as a Service: Attacks and Defenses

Lincan Li, Bolin Shen, Chenxi Zhao et al. · Florida State University · Northeastern University +3 more

Survey of model theft, data reconstruction, and membership inference attacks and defenses for graph ML-as-a-service, with open-source evaluation library PyGIP

Model Theft Model Inversion Attack Membership Inference Attack graph

PDF Code

survey arXiv Aug 20, 2025 · Aug 2025

A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

Kaixiang Zhao, Lincan Li, Kaize Ding et al. · University of Notre Dame · Florida State University +3 more

Surveys model extraction attacks and defenses across MLaaS platforms, proposing a taxonomy of attack mechanisms and computing environments

Model Theft visionnlptabular

PDF Code

attack arXiv Aug 20, 2025 · Aug 2025

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent

Sajib Biswas, Mao Nishino, Samuel Jacob Chacko et al. · Florida State University

Gradient-based adversarial suffix attack on LLMs using exponentiated gradient descent to bypass safety alignment with universal and transferable triggers

Input Manipulation Attack Prompt Injection nlp

PDF Code

Latest papers

In-the-Wild Camouflage Attack on Vehicle Detectors through Controllable Image Editing

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

A Systematic Study of Model Extraction Attacks on Graph Foundation Models

PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

DictPFL: Efficient and Private Federated Learning on Encrypted Gradients

Intellectual Property in Graph-Based Machine Learning as a Service: Attacks and Defenses

A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue