Latest papers

2 papers
benchmark arXiv Apr 1, 2026 · 5d ago

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan et al. · George Mason University · Tulane University +2 more

Benchmark of 120 prompt injection attacks on personal AI agents across skill files, emails, and web content

Prompt Injection Excessive Agency nlpmultimodal
PDF
attack arXiv Nov 10, 2025 · Nov 2025

Diffusion Guided Adversarial State Perturbations in Reinforcement Learning

Xiaolin Sun, Feidi Liu, Zhengming Ding et al. · Tulane University · Fudan University

Diffusion-guided attack generates semantically shifted adversarial states that break RL defenses, including state-of-the-art diffusion-based purification methods

Input Manipulation Attack visionreinforcement-learning
PDF Code