ML Security Papers

ML Security Papers

Latest papers

1 papers

attack arXiv Feb 5, 2026 · 8w ago

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Yao Zhou, Zeen Song, Wenwen Qiang et al. · Institute of Software Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Causal front-door adjustment framework strips LLM safety features via Sparse Autoencoders to achieve state-of-the-art jailbreak success rates

Prompt Injection nlp