Arunesh Sinha

Papers in Database (2)

attack arXiv Aug 6, 2025 · Aug 2025

Automatic LLM Red Teaming

Roman Belaire, Arunesh Sinha, Pradeep Varakantham · Singapore Management University · Rutgers University

Trains an RL agent to conduct multi-turn jailbreak attacks on LLMs by formalizing red teaming as a hierarchical MDP

Prompt Injection nlpreinforcement-learning
PDF
attack arXiv Feb 27, 2026 · 5w ago

Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

Wai Tuck Wong, Jun Sun, Arunesh Sinha · Singapore Management University · Rutgers University

Crafts adversarial images inducing numerical instability in VLMs, causing benchmark performance degradation with minimal pixel perturbation

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF