Latest papers

3 papers
defense arXiv Apr 18, 2026 · 4w ago

The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration

Jiayuan Liu, Shiyi Du, Weihua Du et al. · Carnegie Mellon University · Foundations of Cooperative AI Lab +1 more

Token-level collaborative generation defends multi-agent LLM systems against prompt injection attacks that corrupt majority of agents

Prompt Injection nlp
PDF
attack arXiv Mar 16, 2026 · 9w ago

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Zhenlin Xu, Xiaogang Zhu, Yu Yao et al. · Adelaide University · The University of Sydney +1 more

Memory poisoning attack on LLM agents that hijacks tool selection control flow across tasks via malicious memory retrieval

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Mar 13, 2026 · 9w ago

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He et al. · Southeast University · Nanyang Technological University +2 more

Test-time backdoor defense for LVLMs that detects poisoned inputs via cross-modal attention anomalies and purifies them by pruning trigger tokens

Model Poisoning multimodalnlpvision
PDF