Siddhant Panpatil

Papers in Database (1)

benchmark arXiv Aug 6, 2025 · Aug 2025

Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models

Siddhant Panpatil, Hiskias Dingeto, Haon Park · AIM Intelligence · Seoul National University

Red-teams frontier LLMs via narrative/emotional manipulation scenarios, achieving 76% misalignment rate without jailbreaking

Prompt Injection nlp
PDF Code