Haohua Du

Papers in Database (2)

attack arXiv Aug 1, 2025 · Aug 2025

Activation-Guided Local Editing for Jailbreaking Attacks

Jiecong Wang, Haoran Li, Hao Peng et al. · Beihang University · The Hong Kong University of Science and Technology +3 more

Two-stage LLM jailbreak uses hidden-state activations to guide text-level edits, bypassing safety alignment with SOTA attack success rates

Prompt Injection nlp
PDF Code
benchmark arXiv Aug 19, 2025 · Aug 2025

MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers

Zhiqiang Wang, Yichao Gao, Yanting Wang et al. · University of Science and Technology of China · Beihang University

Benchmarks tool poisoning attacks on real-world MCP servers, revealing 72.8% success rate against top LLM agents

Insecure Plugin Design Prompt Injection nlp
PDF Code