ML Security Papers

Latest papers

2 papers

attack arXiv Feb 5, 2026 · 8w ago

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Yao Zhou, Zeen Song, Wenwen Qiang et al. · Institute of Software Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Causal front-door adjustment framework strips LLM safety features via Sparse Autoencoders to achieve state-of-the-art jailbreak success rates

Prompt Injection nlp

PDF

defense arXiv Jan 8, 2026 · 12w ago

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Zhiyuan Chang, Mingyang Li, Yuekai Huang et al. · State Key Laboratory of Complex System Modeling and Simulation Technology · Institute of Software Chinese Academy of Sciences +3 more

Defends LLMs against prompt injection via diverse synthetic training data and instruction-level chain-of-thought fine-tuning

Prompt Injection nlp

PDF

Latest papers

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue