defense 2025

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Qianshan Wei ¹, Tengchao Yang ¹, Yaochen Wang ², Xinfeng Li ¹, Lijun Li ², Zhenfei Yin ^3,4, Yi Zhan ², Thorsten Holz ¹, Zhiqiang Lin ⁵, XiaoFeng Wang ¹

¹ Nanyang Technological University

² Independent Researcher

³ University of Oxford

⁴ Max Planck Institute

⁵ The Ohio State University

11 citations · 2 influential · 40 references · arXiv

Published on arXiv

2510.02373

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

A-MemGuard reduces attack success rates by over 97% in EHRAgent scenarios and over 60% against self-reinforcing indirect attacks, while maintaining the highest benign task accuracy among all defense baselines.

A-MemGuard

Novel technique introduced

Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex environments. However, this reliance on memory introduces a critical security risk: an adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior. This vulnerability is characterized by two core aspects: First, the malicious effect of injected records is only activated within a specific context, making them hard to detect when individual memory entries are audited in isolation. Second, once triggered, the manipulation can initiate a self-reinforcing error cycle: the corrupted outcome is stored as precedent, which not only amplifies the initial error but also progressively lowers the threshold for similar attacks in the future. To address these challenges, we introduce A-MemGuard (Agent-Memory Guard), the first proactive defense framework for LLM agent memory. The core idea of our work is the insight that memory itself must become both self-checking and self-correcting. Without modifying the agent's core architecture, A-MemGuard combines two mechanisms: (1) consensus-based validation, which detects anomalies by comparing reasoning paths derived from multiple related memories and (2) a dual-memory structure, where detected failures are distilled into ``lessons'' stored separately and consulted before future actions, breaking error cycles and enabling adaptation. Comprehensive evaluations on multiple benchmarks show that A-MemGuard effectively cuts attack success rates by over 95% while incurring a minimal utility cost. This work shifts LLM memory security from static filtering to a proactive, experience-driven model where defenses strengthen over time. Our code is available in https://github.com/TangciuYueng/AMemGuard

Key Contributions

Consensus-based validation that detects anomalous memory entries by comparing reasoning paths derived from multiple related memories, catching injections that appear harmless in isolation
Dual-memory structure that stores detected failures as 'lessons' in a separate repository, breaking self-reinforcing error cycles from indirect memory injection
First proactive defense framework for LLM agent memory, achieving >95% reduction in attack success rate across multiple benchmarks with minimal utility cost

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timetargetedblack_box

Datasets

Agent Security Bench (ASB)EHRAgent benchmark

Applications

llm agentsautonomous planning systemshealthcare agents (ehragent)multi-agent systems

Read PDF arXiv DOI Code

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Cybersecurity AI: Hacking the AI Hackers via Prompt Injection

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI

Building Browser Agents: Architecture, Security, and Practical Solutions

AgenTRIM: Tool Risk Mitigation for Agentic AI

Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection