Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs
Zihui Chen 1, Yuling Wang 1, Pengfei Jiao 1, Kai Wu 1, Xiao Wang 2, Xiang Ao 3, Dalin Zhang 1
Published on arXiv
2603.21155
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Achieves up to 76.3% performance drop across GNN and LLM-based graph reasoners while maintaining stealthiness
BadGraph
Novel technique introduced
Text-attributed graphs (TAGs) enhance graph learning by integrating rich textual semantics and topological context for each node. While boosting expressiveness, they also expose new vulnerabilities in graph learning through text-based adversarial surfaces. Recent advances leverage diverse backbones, such as graph neural networks (GNNs) and pre-trained language models (PLMs), to capture both structural and textual information in TAGs. This diversity raises a key question: How can we design universal adversarial attacks that generalize across architectures to assess the security of TAG models? The challenge arises from the stark contrast in how different backbones-GNNs and PLMs-perceive and encode graph patterns, coupled with the fact that many PLMs are only accessible via APIs, limiting attacks to black-box settings. To address this, we propose BadGraph, a novel attack framework that deeply elicits large language models (LLMs) understanding of general graph knowledge to jointly perturb both node topology and textual semantics. Specifically, we design a target influencer retrieval module that leverages graph priors to construct cross-modally aligned attack shortcuts, thereby enabling efficient LLM-based perturbation reasoning. Experiments show that BadGraph achieves universal and effective attacks across GNN- and LLM-based reasoners, with up to a 76.3% performance drop, while theoretical and empirical analyses confirm its stealthy yet interpretable nature.
Key Contributions
- Universal adversarial attack framework that generalizes across GNN and PLM-based graph learning architectures
- Target influencer retrieval module leveraging graph priors to construct cross-modally aligned attack shortcuts
- LLM-based perturbation reasoning that jointly manipulates both topology and textual semantics
🛡️ Threat Analysis
BadGraph crafts adversarial perturbations to both graph topology and node text attributes to cause misclassification at inference time across different model architectures—this is a clear input manipulation attack.