Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)

The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness assessment. Modern MAS are inherently heterogeneous, integrating conventional Multi-Agent Reinforcement Learning (MARL) with emerging Large Language Model (LLM) agent architectures utilizing Retrieval-Augmented Generation (RAG). A critical shared vulnerability is reliance on centralized memory components: the shared Experience Replay (ER) buffer in MARL and the external Knowledge Base (K) in RAG agents. This paper proposes XAMT (Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures), a novel framework that formalizes attack generation as a bilevel optimization problem. The Upper Level minimizes perturbation magnitude (delta) to enforce covertness while maximizing system behavior divergence toward an adversary-defined target (Lower Level). We provide rigorous mathematical instantiations for CTDE MARL algorithms and RAG-based LLM agents, demonstrating that bilevel optimization uniquely crafts stealthy, minimal-perturbation poisons evading detection heuristics. Comprehensive experimental protocols utilize SMAC and SafeRAG benchmarks to quantify effectiveness at sub-percent poison rates (less than or equal to 1 percent in MARL, less than or equal to 0.1 percent in RAG). XAMT defines a new unified class of training-time threats essential for developing intrinsically secure MAS, with implications for trust, formal verification, and defensive strategies prioritizing intrinsic safety over perimeter-based detection.

Key Contributions

Formalizes memory tampering in heterogeneous MAS as a bilevel optimization problem, enabling principled covertness constraints (Upper Level) coupled with maximal behavioral divergence (Lower Level)
Provides separate mathematical instantiations of XAMT for CTDE MARL (ER buffer poisoning) and RAG-based LLM agents (knowledge base injection), unifying them under a single adversarial framework
Demonstrates high attack success at sub-percent poison rates (≤1% MARL on SMAC, ≤0.1% RAG on SafeRAG) while evading existing anomaly detection heuristics

🛡️ Threat Analysis

Data Poisoning Attack

Core contribution is a data poisoning attack: the Upper Level minimizes perturbation magnitude for covertness while the Lower Level maximizes behavior divergence by corrupting the shared Experience Replay buffer (MARL) and external Knowledge Base (RAG). This is training-time data injection/corruption, the textbook definition of ML02.