Haohua Du

attack arXiv Aug 1, 2025 · Aug 2025

Jiecong Wang, Haoran Li, Hao Peng et al. · Beihang University · The Hong Kong University of Science and Technology +3 more

Two-stage LLM jailbreak uses hidden-state activations to guide text-level edits, bypassing safety alignment with SOTA attack success rates

Prompt Injection nlp

benchmark arXiv Aug 19, 2025 · Aug 2025

Zhiqiang Wang, Yichao Gao, Yanting Wang et al. · University of Science and Technology of China · Beihang University

Benchmarks tool poisoning attacks on real-world MCP servers, revealing 72.8% success rate against top LLM agents

Insecure Plugin Design Prompt Injection nlp

Papers in Database (2)