Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models
Xiaoyu Xue 1, Yuni Lai 1, Chenxi Huang 1, Yulin Zhu 2, Gaolei Li 1, Xiaoge Zhang 1, Kai Zhou 3
Published on arXiv
2510.14470
Model Poisoning
OWASP ML Top 10 — ML10
Transfer Learning Attack
OWASP ML Top 10 — ML07
Key Finding
Achieves outstanding attack success rates including highly concealed single-trigger scenarios while maintaining superior clean accuracy on text-attributed graphs during prompt tuning.
Dual-Trigger Backdoor Attack
Novel technique introduced
The emergence of graph foundation models (GFMs), particularly those incorporating language models (LMs), has revolutionized graph learning and demonstrated remarkable performance on text-attributed graphs (TAGs). However, compared to traditional GNNs, these LM-empowered GFMs introduce unique security vulnerabilities during the unsecured prompt tuning phase that remain understudied in current research. Through empirical investigation, we reveal a significant performance degradation in traditional graph backdoor attacks when operating in attribute-inaccessible constrained TAG systems without explicit trigger node attribute optimization. To address this, we propose a novel dual-trigger backdoor attack framework that operates at both text-level and struct-level, enabling effective attacks without explicit optimization of trigger node text attributes through the strategic utilization of a pre-established text pool. Extensive experimental evaluations demonstrate that our attack maintains superior clean accuracy while achieving outstanding attack success rates, including scenarios with highly concealed single-trigger nodes. Our work highlights critical backdoor risks in web-deployed LM-empowered GFMs and contributes to the development of more robust supervision mechanisms for open-source platforms in the era of foundation models.
Key Contributions
- Reveals that traditional graph backdoor attacks degrade significantly in attribute-inaccessible TAG systems where trigger node attributes cannot be directly optimized
- Proposes a dual-trigger backdoor framework operating at both text-level and structural-level using a pre-established text pool, bypassing the need for explicit trigger attribute optimization
- Demonstrates high attack success rates with stealthy single-trigger nodes while maintaining clean accuracy on LM-empowered GFMs during prompt tuning
🛡️ Threat Analysis
The attack specifically exploits the 'pre-train, prompt-tuning' paradigm by targeting the unsecured prompt tuning (fine-tuning) phase, and is designed to operate in scenarios where the attacker cannot directly access or optimize trigger node attributes — directly exploiting the gap between pre-training and fine-tuning distributions in a transfer learning workflow.
Proposes a backdoor attack that injects hidden dual triggers (text-level and structural-level) into GFMs, causing targeted misclassification when triggers are present while maintaining normal behavior otherwise — classic backdoor/trojan behavior.