ML Security Papers

Stats

Latest papers

3 papers

benchmark arXiv Feb 6, 2026 · 8w ago

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang et al. · Quantstamp · Fujian Normal University +4 more

Empirical study of 98,380 LLM agent skills finds 157 malicious ones using supply chain theft and instruction hijacking

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp

2 citations 1 influentialPDF

defense arXiv Jan 27, 2026 · 9w ago

LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

Haonan Zhang, Dongxia Wang, Yi Liu et al. · Zhejiang University · Huzhou Institute of Industrial Control Technology +1 more

Defends LLMs against jailbreak and over-refusal simultaneously by aligning safety and answer vectors via closed-form weight updates

Prompt Injection nlp

PDF Code

tool arXiv Jan 15, 2026 · 11w ago

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Yi Liu, Weizhe Wang, Ruitao Feng et al. · Nanyang Technological University · Tianjin University +4 more

Scans 31K AI agent skills from marketplaces, finding 26% contain vulnerabilities including prompt injection, data exfiltration, and supply chain risks

AI Supply Chain Attacks Insecure Plugin Design Prompt Injection nlp

8 citations 2 influentialPDF

Latest papers

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue