ML Security Papers

Latest papers

8 papers

attack arXiv Apr 6, 2026 · 6w ago

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zijun Wang, Haoqin Tu, Letian Zhang et al. · UC Santa Cruz · National University of Singapore +4 more

Real-world evaluation showing poisoning of agent persistent state (skills, config, memory) increases attack success from 25% to 64-74% across four LLM backbones

Prompt Injection Excessive Agency nlp

PDF Code

defense arXiv Feb 23, 2026 · 12w ago

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

Bolin Shen, Zhan Cheng, Neil Zhenqiang Gong et al. · Florida State University · University of Wisconsin +2 more

Certifies DNN ownership against model extraction using mutual information similarity with theoretical verification guarantees

Model Theft visionnlp

PDF Code

attack arXiv Feb 14, 2026 · Feb 2026

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Ruomeng Ding, Yifei Pang, He Sun et al. · University of North Carolina at Chapel Hill · Carnegie Mellon University +2 more

Attacks LLM alignment pipelines by crafting benchmark-compliant rubric edits that systematically bias judge preferences and corrupt RLHF training

Transfer Learning Attack Prompt Injection nlp

PDF Code

benchmark arXiv Nov 26, 2025 · Nov 2025

The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning

Ethan Hsu, Harry Chen, Chudi Zhong et al. · Duke University · MIT +2 more

Analyzes how Rashomon set diversity improves adversarial robustness but increases training data leakage via a proven robustness-privacy trade-off

Input Manipulation Attack Model Inversion Attack tabular

PDF

defense arXiv Nov 22, 2025 · Nov 2025

Vulnerability-Aware Robust Multimodal Adversarial Training

Junrui Zhang, Xinyu Zhao, Jie Peng et al. · University of Science & Technology of China · University of North Carolina at Chapel Hill +1 more

Adversarial training defense that quantifies per-modality vulnerability to selectively harden multimodal models against adversarial attacks

Input Manipulation Attack multimodal

PDF Code

survey arXiv Oct 21, 2025 · Oct 2025

The Black Tuesday Attack: how to crash the stock market with adversarial examples to financial forecasting models

Thomas Hofweber, Jefrey Bergl, Ian Reyes et al. · University of North Carolina at Chapel Hill · Oak Ridge National Laboratory

Analyzes how adversarial input manipulations to ML financial forecasting models could trigger self-fulfilling, hard-to-detect stock market crashes

Input Manipulation Attack timeseries

PDF

attack arXiv Oct 12, 2025 · Oct 2025

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

Mohan Zhang, Yihua Zhang, Jinghan Jia et al. · University of North Carolina at Chapel Hill · Michigan State University +1 more

Backdoor-implanted attack on large reasoning models forcing perpetual CoT loops, achieving 100% resource exhaustion success rate

Model Poisoning Model Denial of Service nlp

1 citations PDF

defense arXiv Jan 1, 2025 · Jan 2025

TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan et al. · Imperial College London · Peking University +2 more

Defends RAG systems against corpus poisoning via two-stage cluster filtering and LLM self-assessment to block malicious retrieved documents

Data Poisoning Attack Prompt Injection nlp

10 citations PDF

Latest papers

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning

Vulnerability-Aware Robust Multimodal Adversarial Training

The Black Tuesday Attack: how to crash the stock market with adversarial examples to financial forecasting models

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue