Eric Smith

h-index: 2 9 citations 2 papers (total)

Papers in Database (1)

defense arXiv Oct 1, 2025 · Oct 2025

Large Reasoning Models Learn Better Alignment from Flawed Thinking

ShengYun Peng, Eric Smith, Ivan Evtimov et al. · Meta · Georgia Institute of Technology +1 more

Defends LLMs against chain-of-thought jailbreaks by RL-training models to self-correct injected flawed reasoning premises

Prompt Injection nlp
7 citations PDF