Latest papers

2 papers
attack arXiv Feb 5, 2026 · 8w ago

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Yao Zhou, Zeen Song, Wenwen Qiang et al. · Institute of Software Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Causal front-door adjustment framework strips LLM safety features via Sparse Autoencoders to achieve state-of-the-art jailbreak success rates

Prompt Injection nlp
PDF
defense arXiv Jan 8, 2026 · 12w ago

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Zhiyuan Chang, Mingyang Li, Yuekai Huang et al. · State Key Laboratory of Complex System Modeling and Simulation Technology · Institute of Software Chinese Academy of Sciences +3 more

Defends LLMs against prompt injection via diverse synthetic training data and instruction-level chain-of-thought fine-tuning

Prompt Injection nlp
PDF