Latest papers

4 papers
attack arXiv Oct 4, 2025 · Oct 2025

Cross-Modal Content Optimization for Steering Web Agent Preferences

Tanqiu Jiang, Min Bai, Nikolaos Pappas et al. · Stony Brook University · AWS AI Labs

Black-box attack jointly optimizes adversarial image perturbations and text to steer VLM web agent selection preferences

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
benchmark arXiv Oct 1, 2025 · Oct 2025

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

Shoumik Saha, Jifan Chen, Sam Mayers et al. · University of Maryland - College Park · AWS AI Labs +1 more

Benchmarks jailbreak attacks on code-capable LLM agents, showing agent wrapping raises attack success 1.6x with 32% instantly deployable malicious code

Prompt Injection Excessive Agency nlp
2 citations 1 influentialPDF
attack arXiv Sep 30, 2025 · Sep 2025

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

Jing-Jing Li, Jianfeng He, Chao Shang et al. · AWS AI Labs · UC Berkeley

Multi-turn attack chains innocuous tool calls on LLM agents to achieve harmful goals, exceeding 90% ASR on GPT-4.1

Insecure Plugin Design Prompt Injection nlp
4 citations PDF Code
attack arXiv Sep 5, 2025 · Sep 2025

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis

Disha Makhija, Manoj Ghuhan Arivazhagan, Vinayshekhar Bannihatti Kumar et al. · AWS AI Labs

White-box membership inference attack on LLMs using hidden states and attention patterns achieves AUC 0.85, surpassing output-based methods

Membership Inference Attack nlp
PDF