Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs

Many learning systems now use graph data in which each node also contains text, such as papers with abstracts or users with posts. Because these texts often come from open platforms, an attacker may be able to quietly poison a small part of the training data and later make the model produce wrong predictions on demand. This paper studies that risk in a realistic setting where the attacker edits only node text and does not change the graph structure. We propose TAGBD, a text-only backdoor attack for text-attributed graphs. TAGBD first finds training nodes that are easier to influence, then generates natural-looking trigger text with the help of a shadow graph model, and finally injects the trigger by either replacing the original text or appending a short phrase. Experiments on three benchmark datasets show that the attack is highly effective, transfers across different graph models, and remains strong under common defenses. These results demonstrate that text alone is a practical attack channel in graph learning systems and suggest that future defenses should inspect both graph links and node content.

Key Contributions

First text-only backdoor attack for text-attributed graphs that preserves graph topology
Graph-aware trigger generation framework (TextTrojan) using uncertainty-guided node selection and shadow GNN training
Two injection strategies (overwriting vs. appending) that trade off attack strength and stealth

🛡️ Threat Analysis

Data Poisoning Attack

The attack vector is training data poisoning — the attacker modifies node texts in the training graph to corrupt the learned model. The paper explicitly describes this as 'training-time poisoning' and the attacker 'poisons a small part of the training data' to implant the backdoor.

Model Poisoning

The paper proposes TAGBD, a backdoor attack that implants hidden malicious behavior (trigger-target association) in graph neural networks during training. The attack uses text triggers that activate targeted misclassification while maintaining normal performance on clean data — this is the core definition of a backdoor/trojan attack.