Poisoning the Inner Prediction Logic of Graph Neural Networks for Clean-Label Backdoor Attacks

Graph Neural Networks (GNNs) have achieved remarkable results in various tasks. Recent studies reveal that graph backdoor attacks can poison the GNN model to predict test nodes with triggers attached as the target class. However, apart from injecting triggers to training nodes, these graph backdoor attacks generally require altering the labels of trigger-attached training nodes into the target class, which is impractical in real-world scenarios. In this work, we focus on the clean-label graph backdoor attack, a realistic but understudied topic where training labels are not modifiable. According to our preliminary analysis, existing graph backdoor attacks generally fail under the clean-label setting. Our further analysis identifies that the core failure of existing methods lies in their inability to poison the prediction logic of GNN models, leading to the triggers being deemed unimportant for prediction. Therefore, we study a novel problem of effective clean-label graph backdoor attacks by poisoning the inner prediction logic of GNN models. We propose BA-Logic to solve the problem by coordinating a poisoned node selector and a logic-poisoning trigger generator. Extensive experiments on real-world datasets demonstrate that our method effectively enhances the attack success rate and surpasses state-of-the-art graph backdoor attack competitors under clean-label settings. Our code is available at https://anonymous.4open.science/r/BA-Logic

Key Contributions

Identifies the root cause of existing backdoor attacks failing under clean-label settings: inability to poison the inner prediction logic of GNNs, causing triggers to be deemed unimportant.
Proposes BA-Logic, a clean-label graph backdoor attack combining a poisoned node selector and a logic-poisoning trigger generator to force GNNs to rely on triggers for prediction.
Demonstrates state-of-the-art attack success rates on real-world graph datasets under the clean-label constraint where training labels cannot be modified.

🛡️ Threat Analysis

Model Poisoning

BA-Logic is a backdoor/trojan attack: it embeds hidden, targeted malicious behavior (forcing target-class prediction when a trigger is attached) into GNN models. The model behaves normally on clean inputs, activating only with specific triggers. The clean-label setting (no label modification) is a novel contribution to backdoor methodology, but the core threat is classic ML10 — trojan insertion with trigger-based activation.