Ananth Balashankar

h-index: 10 2,318 citations 43 papers (total)

Papers in Database (2)

defense arXiv Oct 6, 2025 · Oct 2025

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Zizhao Wang, Dingcheng Li, Vaishakh Keshava et al. · Google · The University of Texas at Austin +2 more

Defends LLM tool-using agents from indirect prompt injection via adversarial RL co-training in a two-player zero-sum game

Prompt Injection nlpreinforcement-learning
3 citations PDF
defense arXiv Nov 26, 2025 · Nov 2025

Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

Fatemeh Akbarian, Anahita Baninajjar, Yingyi Zhang et al. · Lund University · Google DeepMind

Defends multi-modal embeddings against adversarial illusions using VAE reconstruction and consensus aggregation, reducing attack success to near-zero

Input Manipulation Attack multimodalvision
PDF