Mahesh Pasupuleti

defense arXiv Oct 1, 2025 · Oct 2025

ShengYun Peng, Eric Smith, Ivan Evtimov et al. · Meta · Georgia Institute of Technology +1 more

Defends LLMs against chain-of-thought jailbreaks by RL-training models to self-correct injected flawed reasoning premises

Prompt Injection nlp

7 citations PDF

Papers in Database (1)