Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks

Extensive research has highlighted the vulnerability of graph neural networks (GNNs) to adversarial attacks, including manipulation, node injection, and the recently emerging threat of backdoor attacks. However, existing defenses typically focus on a single type of attack, lacking a unified approach to simultaneously defend against multiple threats. In this work, we leverage the flexibility of the Mixture of Experts (MoE) architecture to design a scalable and unified framework for defending against backdoor, edge manipulation, and node injection attacks. Specifically, we propose an MI-based logic diversity loss to encourage individual experts to focus on distinct neighborhood structures in their decision processes, thus ensuring a sufficient subset of experts remains unaffected under perturbations in local structures. Moreover, we introduce a robustness-aware router that identifies perturbation patterns and adaptively routes perturbed nodes to corresponding robust experts. Extensive experiments conducted under various adversarial settings demonstrate that our method consistently achieves superior robustness against multiple graph adversarial attacks.

Key Contributions

MI-based logic diversity loss that encourages experts in a MoE architecture to specialize on distinct neighborhood structures, ensuring unaffected experts survive local structural perturbations
Robustness-aware router that detects perturbation patterns and adaptively directs perturbed nodes to appropriate robust experts
Unified RGMoE framework that simultaneously defends GNNs against backdoor, edge manipulation, and node injection attacks in a single scalable architecture

🛡️ Threat Analysis

Input Manipulation Attack

Edge manipulation and node injection are adversarial structural perturbation attacks on GNNs targeting inference correctness; the robustness-aware router and diversity loss directly defend against these input-level manipulation threats.

Model Poisoning

The paper explicitly defends against backdoor attacks on GNNs — a primary focus of the work — where hidden triggers in graph structure activate targeted misclassification, fitting the backdoor/trojan threat model directly.