Pruning Graphs by Adversarial Robustness Evaluation to Strengthen GNN Defenses
Published on arXiv
2512.22128
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
The proposed edge pruning approach significantly enhances GNN defense capability in the high-perturbation regime across three architectures and three benchmark datasets.
Graph Neural Networks (GNNs) have emerged as a dominant paradigm for learning on graph-structured data, thanks to their ability to jointly exploit node features and relational information encoded in the graph topology. This joint modeling, however, also introduces a critical weakness: perturbations or noise in either the structure or the features can be amplified through message passing, making GNNs highly vulnerable to adversarial attacks and spurious connections. In this work, we introduce a pruning framework that leverages adversarial robustness evaluation to explicitly identify and remove fragile or detrimental components of the graph. By using robustness scores as guidance, our method selectively prunes edges that are most likely to degrade model reliability, thereby yielding cleaner and more resilient graph representations. We instantiate this framework on three representative GNN architectures and conduct extensive experiments on benchmarks. The experimental results show that our approach can significantly enhance the defense capability of GNNs in the high-perturbation regime.
Key Contributions
- A graph pruning framework guided by adversarial robustness scores to selectively remove fragile or detrimental edges from the graph topology
- Instantiation and evaluation of the framework on three representative GNN architectures (GCN, GAT, GraphSAGE)
- Demonstrated significant improvement in GNN defense capability under high-perturbation regimes on standard benchmarks
🛡️ Threat Analysis
The paper defends against adversarial perturbations to graph structure (edge insertions, deletions, rewiring) that cause GNN misclassification — a canonical input manipulation attack. The pruning framework explicitly targets edges most likely to degrade reliability under adversarial perturbation.