attack 2025

Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification

Xiaobao Wang 1,2, Ruoxiao Sun 1, Yujun Zhang 1, Bingdao Feng 1, Dongxiao He 1, Luzhi Wang 3, Di Jin 1

2 citations · 51 references · arXiv

α

Published on arXiv

2509.26032

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

DPSBA achieves high attack success rates while significantly reducing anomaly scores compared to SOTA graph backdoor methods, achieving a superior balance between effectiveness and detectability.

DPSBA

Novel technique introduced


Graph Neural Networks (GNNs) have demonstrated strong performance across tasks such as node classification, link prediction, and graph classification, but remain vulnerable to backdoor attacks that implant imperceptible triggers during training to control predictions. While node-level attacks exploit local message passing, graph-level attacks face the harder challenge of manipulating global representations while maintaining stealth. We identify two main sources of anomaly in existing graph classification backdoor methods: structural deviation from rare subgraph triggers and semantic deviation caused by label flipping, both of which make poisoned graphs easily detectable by anomaly detection models. To address this, we propose DPSBA, a clean-label backdoor framework that learns in-distribution triggers via adversarial training guided by anomaly-aware discriminators. DPSBA effectively suppresses both structural and semantic anomalies, achieving high attack success while significantly improving stealth. Extensive experiments on real-world datasets validate that DPSBA achieves a superior balance between effectiveness and detectability compared to state-of-the-art baselines.


Key Contributions

  • Identifies two sources of detectability in existing graph backdoor attacks — structural deviation (rare/unnatural subgraph triggers) and semantic deviation (label flipping) — and quantifies their impact via anomaly detection models.
  • Proposes DPSBA, a clean-label backdoor framework that learns in-distribution triggers via adversarial training guided by anomaly-aware discriminators, eliminating label flipping while suppressing distributional artifacts.
  • Demonstrates on real-world graph classification benchmarks that DPSBA achieves a superior stealth-effectiveness tradeoff compared to ER-B, GTA, and Motif baselines.

🛡️ Threat Analysis

Model Poisoning

DPSBA is a clean-label backdoor attack on GNNs: it implants a hidden trigger during training such that the model predicts the attacker's target class when the trigger subgraph is present, while behaving normally on clean graphs. This is the canonical ML10 threat — targeted, trigger-activated hidden behavior embedded at training time.


Details

Domains
graph
Model Types
gnn
Threat Tags
training_timetargeteddigital
Datasets
AIDSMUTAGPROTEINSCOLLAB
Applications
graph classification