Tal Kachman

attack arXiv Feb 2, 2026 · 9w ago

Samuel Nellessen, Tal Kachman · Radboud University

RL-trained adversarial agent autonomously discovers jailbreaks that manipulate LLM operators into unauthorized tool execution

Prompt Injection Excessive Agency nlp

Papers in Database (1)