Ananth Balashankar

defense arXiv Oct 6, 2025 · Oct 2025

Zizhao Wang, Dingcheng Li, Vaishakh Keshava et al. · Google · The University of Texas at Austin +2 more

Defends LLM tool-using agents from indirect prompt injection via adversarial RL co-training in a two-player zero-sum game

Prompt Injection nlpreinforcement-learning

3 citations PDF

defense arXiv Nov 26, 2025 · Nov 2025

Fatemeh Akbarian, Anahita Baninajjar, Yingyi Zhang et al. · Lund University · Google DeepMind

Defends multi-modal embeddings against adversarial illusions using VAE reconstruction and consensus aggregation, reducing attack success to near-zero

Input Manipulation Attack multimodalvision

Papers in Database (2)