Latest papers

3 papers
benchmark arXiv Jan 26, 2026 · 10w ago

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Dezhang Kong, Zhuxi Wu, Shiqi Liu et al. · Zhejiang University · National University of Malaysia +4 more

Benchmark revealing LLM web agents fail to detect disguised malicious URLs across 61K attack instances in 10 real-world scenarios

Prompt Injection nlp
PDF Code
defense arXiv Jan 13, 2026 · 11w ago

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

Zhenhua Xu, Yiran Zhao, Mengting Zhong et al. · Zhejiang University · Binjiang Institute of Zhejiang University +3 more

Hierarchical backdoor fingerprinting embeds nested stylistic and semantic triggers in LLMs to prove ownership against black-box theft

Model Theft Model Theft nlp
3 citations PDF Code
defense arXiv Aug 14, 2025 · Aug 2025

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

Wenpeng Xing, Zhonghao Qi, Yupeng Qin et al. · Zhejiang University · Binjiang Institute of Zhejiang University +3 more

Defends LLM-tool MCP interfaces from prompt injection and data exfiltration via a three-stage neural detection pipeline

Insecure Plugin Design Prompt Injection nlp
PDF