FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks
Naen Xu 1, Jinghuai Zhang 2, Ping He 1, Chunyi Zhou 1, Jun Wang 3, Zhihui Fu 3, Tianyu Du 1, Zhaoxiang Wang 3, Shouling Ji 1
Published on arXiv
2601.22485
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
FraudShield achieves 89.45% average defense success rate across 5 fraud types and 4 LLMs while maintaining 70.60% MMLU accuracy.
FraudShield
Novel technique introduced
Large language models (LLMs) have been widely integrated into critical automated workflows, including contract review and job application processes. However, LLMs are susceptible to manipulation by fraudulent information, which can lead to harmful outcomes. Although advanced defense methods have been developed to address this issue, they often exhibit limitations in effectiveness, interpretability, and generalizability, particularly when applied to LLM-based applications. To address these challenges, we introduce FraudShield, a novel framework designed to protect LLMs from fraudulent content by leveraging a comprehensive analysis of fraud tactics. Specifically, FraudShield constructs and refines a fraud tactic-keyword knowledge graph to capture high-confidence associations between suspicious text and fraud techniques. The structured knowledge graph augments the original input by highlighting keywords and providing supporting evidence, guiding the LLM toward more secure responses. Extensive experiments show that FraudShield consistently outperforms state-of-the-art defenses across four mainstream LLMs and five representative fraud types, while also offering interpretable clues for the model's generations.
Key Contributions
- Constructs and refines a fraud tactic-keyword knowledge graph linking suspicious text patterns to known fraud techniques
- Augments LLM inputs with highlighted fraud keywords and supporting evidence from the knowledge graph to guide secure responses
- Evaluates across 4 mainstream LLMs and 5 fraud types (fraudulent service, impersonation, phishing, fake job postings, online relationship scams) in both normal and role-play settings