Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)
Akhil Sharma , Shaikh Yaser Arafat , Jai Kumar Sharma , Ken Huang
Published on arXiv
2512.15790
Data Poisoning Attack
OWASP ML Top 10 — ML02
Training Data Poisoning
OWASP LLM Top 10 — LLM03
Key Finding
XAMT achieves high attack success and utility degradation at sub-percent poison rates (≤1% MARL, ≤0.1% RAG) with minimal semantic drift, circumventing threshold-based and anomaly detection defenses.
XAMT
Novel technique introduced
The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness assessment. Modern MAS are inherently heterogeneous, integrating conventional Multi-Agent Reinforcement Learning (MARL) with emerging Large Language Model (LLM) agent architectures utilizing Retrieval-Augmented Generation (RAG). A critical shared vulnerability is reliance on centralized memory components: the shared Experience Replay (ER) buffer in MARL and the external Knowledge Base (K) in RAG agents. This paper proposes XAMT (Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures), a novel framework that formalizes attack generation as a bilevel optimization problem. The Upper Level minimizes perturbation magnitude (delta) to enforce covertness while maximizing system behavior divergence toward an adversary-defined target (Lower Level). We provide rigorous mathematical instantiations for CTDE MARL algorithms and RAG-based LLM agents, demonstrating that bilevel optimization uniquely crafts stealthy, minimal-perturbation poisons evading detection heuristics. Comprehensive experimental protocols utilize SMAC and SafeRAG benchmarks to quantify effectiveness at sub-percent poison rates (less than or equal to 1 percent in MARL, less than or equal to 0.1 percent in RAG). XAMT defines a new unified class of training-time threats essential for developing intrinsically secure MAS, with implications for trust, formal verification, and defensive strategies prioritizing intrinsic safety over perimeter-based detection.
Key Contributions
- Formalizes memory tampering in heterogeneous MAS as a bilevel optimization problem, enabling principled covertness constraints (Upper Level) coupled with maximal behavioral divergence (Lower Level)
- Provides separate mathematical instantiations of XAMT for CTDE MARL (ER buffer poisoning) and RAG-based LLM agents (knowledge base injection), unifying them under a single adversarial framework
- Demonstrates high attack success at sub-percent poison rates (≤1% MARL on SMAC, ≤0.1% RAG on SafeRAG) while evading existing anomaly detection heuristics
🛡️ Threat Analysis
Core contribution is a data poisoning attack: the Upper Level minimizes perturbation magnitude for covertness while the Lower Level maximizes behavior divergence by corrupting the shared Experience Replay buffer (MARL) and external Knowledge Base (RAG). This is training-time data injection/corruption, the textbook definition of ML02.