Latest papers

219 papers
survey arXiv Apr 1, 2026 · 5d ago

Safety, Security, and Cognitive Risks in World Models

Manoj Parmar · SovereignAI Security Labs

Unified threat model for world model AI systems covering adversarial attacks, data poisoning, alignment risks, and cognitive security

Input Manipulation Attack Data Poisoning Attack Model Poisoning Prompt Injection Excessive Agency reinforcement-learningmultimodalvisionnlp
PDF
benchmark arXiv Apr 1, 2026 · 5d ago

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan et al. · George Mason University · Tulane University +2 more

Benchmark of 120 prompt injection attacks on personal AI agents across skill files, emails, and web content

Prompt Injection Excessive Agency nlpmultimodal
PDF
defense arXiv Apr 1, 2026 · 5d ago

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng et al. · The Pennsylvania State University

Rule-based prompt injection detector using causal attribution to identify malicious context segments in long-context LLM agents

Prompt Injection Excessive Agency nlp
PDF Code
survey arXiv Mar 31, 2026 · 6d ago

The Persistent Vulnerability of Aligned AI Systems

Aengus Lynch · University College London

Comprehensive AI safety thesis spanning mechanistic interpretability, sleeper agent defenses, jailbreaking frontier models, and autonomous agent misalignment

Input Manipulation Attack Prompt Injection Excessive Agency nlpvisionaudiomultimodal
PDF
benchmark arXiv Mar 30, 2026 · 7d ago

Evaluating Privilege Usage of Agents on Real-World Tools

Quan Zhang, Lianhang Fu, Lvsi Lian et al. · East China Normal University · Xinjiang University +1 more

Benchmark evaluating LLM agents' privilege control under prompt injection attacks using real-world tools, finding 84.80% attack success

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
attack arXiv Mar 26, 2026 · 11d ago

The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Ron Litvak · Columbia University

System prompt engineering creates exploitable phishing detection vulnerabilities in LLM email agents despite strong benchmark performance

Input Manipulation Attack Prompt Injection Excessive Agency nlp
PDF
defense arXiv Mar 24, 2026 · 13d ago

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Rohan Sequeira, Stavros Damianakis, Umar Iqbal et al. · University of Southern California · Washington University in St. Louis

Behavioral bounds framework that blocks malicious tool calls in LLM agents by learning execution patterns and detecting deviations

Prompt Injection Excessive Agency nlp
PDF
survey arXiv Mar 24, 2026 · 13d ago

SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy

Ali Dehghantanha, Sajad Homayoun · University of Guelph · Aalborg University

Surveys attack surface of agentic LLM systems: prompt injection, RAG poisoning, tool exploits, and multi-agent threats with defense taxonomy

Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
PDF
defense arXiv Mar 22, 2026 · 15d ago

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Octavian Untila · Aisophical SRL

Autonomous AI system independently discovers SMT-based formal verification for AI safety across six domains with 100% accuracy

Output Integrity Attack Insecure Plugin Design Excessive Agency nlpmultimodal
PDF
attack arXiv Mar 22, 2026 · 15d ago

Is Monitoring Enough? Strategic Agent Selection For Stealthy Attack in Multi-Agent Discussions

Qiuchi Xiang, Haoxuan Qu, Hossein Rahmani et al. · Lancaster University

Stealth attack on multi-agent LLM discussions that evades continuous anomaly monitoring through strategic agent selection and message crafting

Prompt Injection Excessive Agency nlpmultimodal
PDF
attack CoDAIM workshop Mar 21, 2026 · 16d ago

ACRFence: Preventing Semantic Rollback Attacks in Agent Checkpoint-Restore

Yusheng Zheng, Yiwei Yang, Wei Zhang et al. · UC Santa Cruz · University of Connecticut

LLM agent checkpoint-restore creates replay vulnerabilities enabling duplicate payments and credential reuse through non-deterministic request regeneration

Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Mar 20, 2026 · 17d ago

Memory poisoning and secure multi-agent systems

Vicenç Torra, Maria Bras-Amorós · Umeå University · Universitat Politècnica de Catalunya

Defends LLM-based agents against memory poisoning attacks across semantic, episodic, and short-term memory using cryptographic techniques

Data Poisoning Attack Excessive Agency nlp
PDF
attack arXiv Mar 19, 2026 · 18d ago

The Autonomy Tax: Defense Training Breaks LLM Agents

Shawn Li, Yue Zhao · University of Southern California

Defense training against prompt injection destroys LLM agent tool-use competence, causing 99% timeout rates and 73-86% attack bypass

Prompt Injection Excessive Agency nlp
PDF
tool arXiv Mar 19, 2026 · 18d ago

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

Haochen Zhao, Shaoyang Cui · National University of Singapore · Tsinghua University

MITM-based red-teaming framework that tests autonomous web agent security through real-time network traffic manipulation attacks

Prompt Injection Excessive Agency nlp
PDF Code
tool arXiv Mar 18, 2026 · 19d ago

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Hammad Atta, Ken Huang, Kyriakos Rock Lambros et al. · Qorvex Consulting · Distributedapps.ai +8 more

Automated red-teaming framework for multi-stage prompt injection attacks on agentic LLMs with persistent memory and RAG

Prompt Injection Excessive Agency nlp
PDF
tool arXiv Mar 18, 2026 · 19d ago

VeriGrey: Greybox Agent Validation

Yuntong Zhang, Sungmin Kang, Ruijie Meng et al. · National University of Singapore · Max-Planck Institute of Security and Privacy

Greybox fuzzing framework that discovers indirect prompt injection vulnerabilities in LLM agents by mutating prompts and tracking tool invocations

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Mar 18, 2026 · 19d ago

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti · Commure · nFactor Technologies

Zero-trust architecture for healthcare AI agents using kernel isolation, credential proxies, network policies, and prompt integrity framework

AI Supply Chain Attacks Prompt Injection Excessive Agency nlp
PDF
benchmark arXiv Mar 16, 2026 · 21d ago

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu et al. · Gray Swan AI · OpenAI +6 more

Large-scale red teaming competition finds all frontier LLM agents vulnerable to concealed indirect prompt injection attacks with 0.5-8.5% success rates

Prompt Injection Excessive Agency nlpmultimodal
PDF
benchmark arXiv Mar 16, 2026 · 21d ago

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

Kai Wang, Biaojie Zeng, Zeming Wei et al. · Shanghai AI Laboratory

Comprehensive safety framework evaluating 20 risk types across LLM multi-agent systems with runtime monitoring and OWASP-grounded taxonomy

Prompt Injection Excessive Agency nlpmultimodal
PDF
attack arXiv Mar 16, 2026 · 21d ago

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Zhenlin Xu, Xiaogang Zhu, Yu Yao et al. · Adelaide University · The University of Sydney +1 more

Memory poisoning attack on LLM agents that hijacks tool selection control flow across tasks via malicious memory retrieval

Prompt Injection Excessive Agency nlp
PDF
Loading more papers…