Kimin Lee

Papers in Database (1)

defense arXiv Aug 19, 2025 · Aug 2025

Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation

Dongyoon Hahm, Taywon Min, Woogyeol Jin et al. · KAIST

Discovers fine-tuning LLMs on benign agentic tasks erodes safety alignment; proposes PING prefix-injection defense for agents

Transfer Learning Attack Excessive Agency nlp
PDF Code