Dingcheng Li

h-index: 1 3 citations 3 papers (total)

Papers in Database (1)

defense arXiv Oct 6, 2025 · Oct 2025

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Zizhao Wang, Dingcheng Li, Vaishakh Keshava et al. · Google · The University of Texas at Austin +2 more

Defends LLM tool-using agents from indirect prompt injection via adversarial RL co-training in a two-player zero-sum game

Prompt Injection nlpreinforcement-learning
3 citations PDF