Joykirat Singh

defense arXiv Mar 3, 2026 · 4w ago

Aradhye Agarwal, Gurdit Siyan, Yash Pandya et al. · Microsoft Research

Post-training RL framework that teaches agentic LLMs to refuse harmful tool-use actions and resist prompt injection in multi-step settings

Prompt Injection Excessive Agency nlp

Papers in Database (1)