Latest papers

1 papers
benchmark arXiv Jan 26, 2026 · 10w ago

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Dezhang Kong, Zhuxi Wu, Shiqi Liu et al. · Zhejiang University · National University of Malaysia +4 more

Benchmark revealing LLM web agents fail to detect disguised malicious URLs across 61K attack instances in 10 real-world scenarios

Prompt Injection nlp
PDF Code