Xinhe Wang

Papers in Database (1)

attack arXiv Apr 27, 2026 · 24d ago

Jailbreaking Frontier Foundation Models Through Intention Deception

Xinhe Wang, Katia Sycara, Yaqi Xie · Carnegie Mellon University

Multi-turn jailbreaking attack that deceives LLM safety by simulating benign intent across conversations to elicit harmful outputs

Prompt Injection nlpmultimodal
PDF