Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs
Qi Luo , Minghui Xu , Dongxiao Yu , Xiuzhen Cheng
Published on arXiv
2603.20339
Model Poisoning
OWASP ML Top 10 — ML10
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Achieves 100.00%, 99.85%, and 99.96% attack success rate on Cora, Pubmed, and Arxiv respectively while maintaining clean accuracy and surviving common defenses
TAGBD
Novel technique introduced
Many learning systems now use graph data in which each node also contains text, such as papers with abstracts or users with posts. Because these texts often come from open platforms, an attacker may be able to quietly poison a small part of the training data and later make the model produce wrong predictions on demand. This paper studies that risk in a realistic setting where the attacker edits only node text and does not change the graph structure. We propose TAGBD, a text-only backdoor attack for text-attributed graphs. TAGBD first finds training nodes that are easier to influence, then generates natural-looking trigger text with the help of a shadow graph model, and finally injects the trigger by either replacing the original text or appending a short phrase. Experiments on three benchmark datasets show that the attack is highly effective, transfers across different graph models, and remains strong under common defenses. These results demonstrate that text alone is a practical attack channel in graph learning systems and suggest that future defenses should inspect both graph links and node content.
Key Contributions
- First text-only backdoor attack for text-attributed graphs that preserves graph topology
- Graph-aware trigger generation framework (TextTrojan) using uncertainty-guided node selection and shadow GNN training
- Two injection strategies (overwriting vs. appending) that trade off attack strength and stealth
🛡️ Threat Analysis
The attack vector is training data poisoning — the attacker modifies node texts in the training graph to corrupt the learned model. The paper explicitly describes this as 'training-time poisoning' and the attacker 'poisons a small part of the training data' to implant the backdoor.
The paper proposes TAGBD, a backdoor attack that implants hidden malicious behavior (trigger-target association) in graph neural networks during training. The attack uses text triggers that activate targeted misclassification while maintaining normal performance on clean data — this is the core definition of a backdoor/trojan attack.