Latest papers

2 papers
attack arXiv Nov 22, 2025 · Nov 2025

Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models

Jiayi Luo, Qingyun Sun, Lingjuan Lyu et al. · Beihang University · Sony AI +1 more

Backdoor attack on Graph Foundation Models with label-free triggers and fine-tuning-resistant anchoring for persistence

Model Poisoning Transfer Learning Attack graph
1 citations PDF
defense arXiv Oct 6, 2025 · Oct 2025

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Zizhao Wang, Dingcheng Li, Vaishakh Keshava et al. · Google · The University of Texas at Austin +2 more

Defends LLM tool-using agents from indirect prompt injection via adversarial RL co-training in a two-player zero-sum game

Prompt Injection nlpreinforcement-learning
3 citations PDF