Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models

The emergence of graph foundation models (GFMs), particularly those incorporating language models (LMs), has revolutionized graph learning and demonstrated remarkable performance on text-attributed graphs (TAGs). However, compared to traditional GNNs, these LM-empowered GFMs introduce unique security vulnerabilities during the unsecured prompt tuning phase that remain understudied in current research. Through empirical investigation, we reveal a significant performance degradation in traditional graph backdoor attacks when operating in attribute-inaccessible constrained TAG systems without explicit trigger node attribute optimization. To address this, we propose a novel dual-trigger backdoor attack framework that operates at both text-level and struct-level, enabling effective attacks without explicit optimization of trigger node text attributes through the strategic utilization of a pre-established text pool. Extensive experimental evaluations demonstrate that our attack maintains superior clean accuracy while achieving outstanding attack success rates, including scenarios with highly concealed single-trigger nodes. Our work highlights critical backdoor risks in web-deployed LM-empowered GFMs and contributes to the development of more robust supervision mechanisms for open-source platforms in the era of foundation models.

Key Contributions

Reveals that traditional graph backdoor attacks degrade significantly in attribute-inaccessible TAG systems where trigger node attributes cannot be directly optimized
Proposes a dual-trigger backdoor framework operating at both text-level and structural-level using a pre-established text pool, bypassing the need for explicit trigger attribute optimization
Demonstrates high attack success rates with stealthy single-trigger nodes while maintaining clean accuracy on LM-empowered GFMs during prompt tuning

🛡️ Threat Analysis

Transfer Learning Attack

The attack specifically exploits the 'pre-train, prompt-tuning' paradigm by targeting the unsecured prompt tuning (fine-tuning) phase, and is designed to operate in scenarios where the attacker cannot directly access or optimize trigger node attributes — directly exploiting the gap between pre-training and fine-tuning distributions in a transfer learning workflow.

Model Poisoning

Proposes a backdoor attack that injects hidden dual triggers (text-level and structural-level) into GFMs, causing targeted misclassification when triggers are present while maintaining normal behavior otherwise — classic backdoor/trojan behavior.