David K. Elson

h-index: 2 47 citations 4 papers (total)

Papers in Database (1)

defense arXiv Oct 31, 2025 · Oct 2025

Consistency Training Helps Stop Sycophancy and Jailbreaks

Alex Irpan, Alexander Matt Turner, Mark Kurzeja et al. · Google

Defends LLMs against jailbreaks and sycophancy via consistency training, making models invariant to adversarial prompt manipulations

Prompt Injection nlp
PDF