Stephen H. Bach

attack arXiv Oct 23, 2025 · Oct 2025

Zheng-Xin Yong, Stephen H. Bach · Brown University

Discovers reasoning LLMs self-jailbreak via chain-of-thought after benign math/code fine-tuning, despite recognizing harmful requests

Transfer Learning Attack Prompt Injection nlp

Papers in Database (1)