Joykirat Singh

Papers in Database (1)

defense arXiv Mar 3, 2026 · 4w ago

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Aradhye Agarwal, Gurdit Siyan, Yash Pandya et al. · Microsoft Research

Post-training RL framework that teaches agentic LLMs to refuse harmful tool-use actions and resist prompt injection in multi-step settings

Prompt Injection Excessive Agency nlp
PDF