ML Security Papers

Latest papers

4 papers

defense arXiv Mar 4, 2026 · 4w ago

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information

Yifan Zhu, Yibo Miao, Yinpeng Dong et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Proposes MI-UE, a theoretically grounded availability-poisoning defense that blocks unauthorized model training by reducing mutual information in poisoned image features

Data Poisoning Attack vision

PDF

benchmark arXiv Feb 27, 2026 · 5w ago

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Zhicheng Fang, Jingjie Zheng, Chenxu Fu et al. · Shanghai Qi Zhi Institute · University of Melbourne +1 more

Automated multi-agent system translates jailbreak papers into executable modules for standardized, reproducible LLM robustness benchmarking

Prompt Injection nlp

PDF Code

attack arXiv Feb 7, 2026 · 8w ago

Reverse-Engineering Model Editing on Language Models

Zhiyu Sun, Minrui Luo, Yu Wang et al. · Shanghai Qi Zhi Institute · East China Normal University +3 more

Recovers private edited data from LLM parameter update matrices using spectral analysis and entropy-based prompt reconstruction

Model Inversion Attack Sensitive Information Disclosure nlp

PDF Code

defense arXiv Sep 29, 2025 · Sep 2025

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Yichi Zhang, Yue Ding, Jingwen Yang et al. · arXiv · Shanghai Qi Zhi Institute +3 more

Defends Large Reasoning Models against jailbreaks by aligning CoT safety via process-supervised preference optimization with corrective interventions

Prompt Injection nlp

2 citations 1 influentialPDF

Latest papers

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Reverse-Engineering Model Editing on Language Models

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue