GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs

Text-attributed graphs (TAGs), which combine structural and textual node information, are ubiquitous across many domains. Recent work integrates Large Language Models (LLMs) with Graph Neural Networks (GNNs) to jointly model semantics and structure, resulting in more general and expressive models that achieve state-of-the-art performance on TAG benchmarks. However, this integration introduces dual vulnerabilities: GNNs are sensitive to structural perturbations, while LLM-derived features are vulnerable to prompt injection and adversarial phrasing. While existing adversarial attacks largely perturb structure or text independently, we find that uni-modal attacks cause only modest degradation in LLM-enhanced GNNs. Moreover, many existing attacks assume unrealistic capabilities, such as white-box access or direct modification of graph data. To address these gaps, we propose GRAPHTEXTACK, the first black-box, multi-modal{, poisoning} node injection attack for LLM-enhanced GNNs. GRAPHTEXTACK injects nodes with carefully crafted structure and semantics to degrade model performance, operating under a realistic threat model without relying on model internals or surrogate models. To navigate the combinatorial, non-differentiable search space of connectivity and feature assignments, GRAPHTEXTACK introduces a novel evolutionary optimization framework with a multi-objective fitness function that balances local prediction disruption and global graph influence. Extensive experiments on five datasets and two state-of-the-art LLM-enhanced GNN models show that GRAPHTEXTACK significantly outperforms 12 strong baselines.

Key Contributions

First black-box, multi-modal poisoning node injection attack jointly optimizing adversarial graph structure and adversarial LLM-processed text features for LLM-enhanced GNNs
Evolutionary optimization framework with joint candidate encoding, multi-modal crossover/mutation, and a multi-objective fitness function balancing local prediction disruption and global graph influence
Empirical demonstration that uni-modal attacks fail against LLM-enhanced GNNs, while GRAPHTEXTACK significantly outperforms 12 baselines across five datasets and two target models

🛡️ Threat Analysis

Data Poisoning Attack

GRAPHTEXTACK is explicitly framed as a poisoning attack — injected malicious nodes corrupt the training graph with adversarially crafted structure and LLM-processed text features to degrade model performance on downstream node classification tasks. The threat model is training-time data corruption via node injection, matching ML02's core definition.