ML Security Papers

Latest papers

50 papers

defense arXiv Apr 6, 2026 · 2d ago

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Zhuowen Yuan, Zhaorun Chen, Zhen Xiang et al. · University of Illinois Urbana-Champaign · Virtue AI +6 more

Network-level guardrail detecting supply-chain poisoning in LLM agent MCP tools via MITM proxy monitoring network behaviors

AI Supply Chain Attacks Insecure Plugin Design nlp

PDF

benchmark arXiv Apr 3, 2026 · 5d ago

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu et al. · Fujian Normal University · Wake Forest University +7 more

Large-scale analysis of 17K LLM agent skills finding 520 vulnerable to credential leakage via debug logging and prompt injection

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlp

PDF

attack arXiv Apr 3, 2026 · 5d ago

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Yubin Qu, Yi Liu, Tongcheng Geng et al. · Griffith University · Quantstamp +6 more

Supply-chain attack embedding malicious payloads in LLM agent skill documentation, achieving up to 33.5% bypass of defenses

AI Supply Chain Attacks Insecure Plugin Design Excessive Agency nlp

PDF

attack arXiv Apr 1, 2026 · 7d ago

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou et al. · Huazhong University of Science and Technology · Hubei University

Embeds latent trojans in individually safe LLMs that activate during model merging, bypassing safety alignment

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp

PDF

attack arXiv Mar 20, 2026 · 19d ago

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Fazhong Liu, Zhuoyan Chen, Tu Lan et al. · Shanghai Jiao Tong University

Supply chain attack embedding malicious operational narratives in autonomous coding agent bootstrap guidance, achieving up to 64% success rate

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlp

PDF

defense arXiv Mar 18, 2026 · 21d ago

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti · Commure · nFactor Technologies

Zero-trust architecture for healthcare AI agents using kernel isolation, credential proxies, network policies, and prompt integrity framework

AI Supply Chain Attacks Prompt Injection Excessive Agency nlp

PDF

attack arXiv Mar 16, 2026 · 23d ago

ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

Yihao Zhang, Zeming Wei, Xiaokun Luan et al. · Peking University · Sun Yat-Sen University +3 more

Self-replicating worm attack on LLM agent ecosystems achieving autonomous propagation through configuration hijacking and broadcast infection

AI Supply Chain Attacks Prompt Injection Excessive Agency nlpmultimodal

PDF

survey arXiv Mar 13, 2026 · 26d ago

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Zonghao Ying, Xiao Yang, Siyang Wu et al. · Beihang University · Zhongguancun Laboratory +1 more

Security analysis of OpenClaw autonomous agents revealing prompt injection RCE, tool chain attacks, and proposing FASA defense architecture

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal

PDF Code

attack arXiv Mar 13, 2026 · 26d ago

Colluding LoRA: A Composite Attack on LLM Safety Alignment

Sihao Ding · Mercedes-Benz

Attack merging individually benign LoRA adapters that collectively disable LLM safety alignment without requiring adversarial prompts

AI Supply Chain Attacks Model Poisoning Prompt Injection nlp

PDF

defense arXiv Feb 27, 2026 · 5w ago

Formal Analysis and Supply Chain Security for Agentic AI Skills

Varun Pratap Bhardwaj

Formal verification framework securing LLM agent skill supply chains from malicious plugin injection with soundness-proven static analysis and sandboxing

AI Supply Chain Attacks Insecure Plugin Design nlp

PDF

attack arXiv Feb 25, 2026 · 6w ago

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Liangwei Lyu, Jiaqi Xu, Jianwei Ding et al. · People’s Public Security University of China

Injects backdoors into text-to-image diffusion models via malicious LoRA adapters masquerading as benign community-shared modules, achieving 99.8% attack success rate.

Model Poisoning AI Supply Chain Attacks visiongenerativemultimodal

PDF Code

survey arXiv Feb 24, 2026 · 6w ago

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng et al. · University of Technology Sydney · CSIRO

Surveys LLM agentic skill security covering marketplace supply-chain attacks, prompt injection via skill payloads, and trust-tiered execution

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlpreinforcement-learning

PDF

tool arXiv Feb 23, 2026 · 6w ago

SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Hillel Ohayon, Daniel Gilkarov, Ran Dubin · Ariel University

ML-based static scanner detects malicious pickle model files on HuggingFace, outperforming all existing scanners including against evasion-optimized payloads

AI Supply Chain Attacks

PDF

defense arXiv Feb 16, 2026 · 7w ago

Weight space Detection of Backdoors in LoRA Adapters

David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit et al. · Algoverse AI Research · University of Aberdeen +1 more

Detects backdoored LoRA adapters via SVD spectral statistics on weight matrices, achieving 97% accuracy without model execution

Model Poisoning AI Supply Chain Attacks nlp

PDF

benchmark arXiv Feb 12, 2026 · 7w ago

MalTool: Malicious Tool Attacks on LLM Agents

Yuepeng Hu, Yuqi Jia, Mengyuan Li et al. · Duke University · UC Berkeley

Benchmarks malicious tool code attacks on LLM agents; coding LLMs generate evasive malware that defeats VirusTotal and agent-specific detectors

AI Supply Chain Attacks Insecure Plugin Design nlp

PDF

tool arXiv Feb 9, 2026 · 8w ago

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi · Purdue University

Exposes PRNG implementation weaknesses in ML frameworks as covert attack vectors, defends with RNGGuard static+runtime enforcement tool

AI Supply Chain Attacks

PDF

benchmark arXiv Feb 6, 2026 · 8w ago

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang et al. · Quantstamp · Fujian Normal University +4 more

Empirical study of 98,380 LLM agent skills finds 157 malicious ones using supply chain theft and instruction hijacking

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp

2 citations 1 influentialPDF

defense arXiv Feb 5, 2026 · 8w ago

Among Us: Measuring and Mitigating Malicious Contributions in Model Collaboration Systems

Ziyuan Yang, Wenxuan Ding, Shangbin Feng et al. · University of Washington · New York University

Measures malicious third-party models' impact on multi-LLM collaboration systems and proposes supervisor-based defenses recovering 95% performance

AI Supply Chain Attacks Model Poisoning nlp

PDF Code

attack arXiv Feb 5, 2026 · 8w ago

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

IEEE Publication Technology · IEEE

Exploits LLM chat template configs to silently inject persistent malicious system-prompt instructions, achieving 100% backdoor success without retraining

AI Supply Chain Attacks Prompt Injection nlp

PDF

attack arXiv Feb 4, 2026 · 9w ago

Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning

Written by AAAI Press Staff, Pater Patel Schneider, Sunil Issar et al. · Association for the Advancement of Artificial Intelligence

Attacks RL agents via malicious simulator dynamics to implant reward-free backdoors transferable to real robotic hardware

Model Poisoning AI Supply Chain Attacks reinforcement-learning

PDF

Loading more papers…

Latest papers

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Colluding LoRA: A Composite Attack on LLM Safety Alignment

Formal Analysis and Supply Chain Security for Agentic AI Skills

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Weight space Detection of Backdoors in LoRA Adapters

MalTool: Malicious Tool Attacks on LLM Agents

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Among Us: Measuring and Mitigating Malicious Contributions in Model Collaboration Systems

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue