Latest papers

2 papers
attack arXiv Mar 16, 2026 · 21d ago

Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents

Xunzhuo Liu, Bowei He, Xue Liu et al. · vLLM Semantic Router Project · MBZUAI +3 more

Introduces visual confused deputy attacks on GUI agents via screenshot manipulation and proposes dual-channel guardrails verifying both visual targets and textual reasoning

Input Manipulation Attack Output Integrity Attack Excessive Agency visionmultimodalnlp
PDF Code
attack arXiv Feb 22, 2026 · 6w ago

Understanding Empirical Unlearning with Combinatorial Interpretability

Shingo Kodama, Niv Cohen, Micah Adler et al. · Middlebury College · New York University +2 more

Attacks machine unlearning methods using combinatorial interpretability, showing erased knowledge persists in weights and recovers rapidly via fine-tuning

Model Inversion Attack nlpvision
PDF