defense 2025

Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection

Junjun Pan ¹, Yixin Liu ¹, Rui Miao ², Kaize Ding ³, Yu Zheng ¹, Quoc Viet Hung Nguyen ¹, Alan Wee-Chung Liew ¹, Shirui Pan ¹

¹ Griffith University

² Jilin University

³ Northwestern University

1 citations · 45 references · arXiv

Published on arXiv

2512.18733

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

XG-Guard demonstrates robust malicious agent detection and strong interpretability across diverse MAS topologies and attack scenarios, outperforming existing GAD-based defenses that rely on coarse sentence-level features only.

XG-Guard

Novel technique introduced

Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting malicious agents has become a critical security concern. Although existing graph anomaly detection (GAD)-based defenses can identify anomalous agents, they mainly rely on coarse sentence-level information and overlook fine-grained lexical cues, leading to suboptimal performance. Moreover, the lack of interpretability in these methods limits their reliability and real-world applicability. To address these limitations, we propose XG-Guard, an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS. To incorporate both coarse and fine-grained textual information for anomalous agent identification, we utilize a bi-level agent encoder to jointly model the sentence- and token-level representations of each agent. A theme-based anomaly detector further captures the evolving discussion focus in MAS dialogues, while a bi-level score fusion mechanism quantifies token-level contributions for explanation. Extensive experiments across diverse MAS topologies and attack scenarios demonstrate robust detection performance and strong interpretability of XG-Guard.

Key Contributions

XG-Guard: an explainable bi-level graph anomaly detection framework for identifying malicious agents in LLM-based MAS across diverse topologies and attack scenarios
Bi-level agent encoder jointly modeling sentence- and token-level representations for fine-grained anomalous agent identification
Theme-based anomaly detector and bi-level score fusion mechanism that quantifies token-level contributions for interpretable detection decisions

🛡️ Threat Analysis

Details

Domains

nlpgraph

Model Types

llmgnntransformer

Threat Tags

inference_time

Applications

llm multi-agent systemssafety-critical autonomous agents

Read PDF arXiv DOI

Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation

Throttling Web Agents Using Reasoning Gates

How does information access affect LLM monitors' ability to detect sabotage?

Password-Activated Shutdown Protocols for Misaligned Frontier Agents

Basic Legibility Protocols Improve Trusted Monitoring