Tal Kachman

h-index: 9 584 citations 30 papers (total)

Papers in Database (1)

attack arXiv Feb 2, 2026 · 9w ago

David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning

Samuel Nellessen, Tal Kachman · Radboud University

RL-trained adversarial agent autonomously discovers jailbreaks that manipulate LLM operators into unauthorized tool execution

Prompt Injection Excessive Agency nlp
PDF