defense 2026

FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks

Naen Xu 1, Jinghuai Zhang 2, Ping He 1, Chunyi Zhou 1, Jun Wang 3, Zhihui Fu 3, Tianyu Du 1, Zhaoxiang Wang 3, Shouling Ji 1

0 citations · 45 references · arXiv

α

Published on arXiv

2601.22485

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

FraudShield achieves 89.45% average defense success rate across 5 fraud types and 4 LLMs while maintaining 70.60% MMLU accuracy.

FraudShield

Novel technique introduced


Large language models (LLMs) have been widely integrated into critical automated workflows, including contract review and job application processes. However, LLMs are susceptible to manipulation by fraudulent information, which can lead to harmful outcomes. Although advanced defense methods have been developed to address this issue, they often exhibit limitations in effectiveness, interpretability, and generalizability, particularly when applied to LLM-based applications. To address these challenges, we introduce FraudShield, a novel framework designed to protect LLMs from fraudulent content by leveraging a comprehensive analysis of fraud tactics. Specifically, FraudShield constructs and refines a fraud tactic-keyword knowledge graph to capture high-confidence associations between suspicious text and fraud techniques. The structured knowledge graph augments the original input by highlighting keywords and providing supporting evidence, guiding the LLM toward more secure responses. Extensive experiments show that FraudShield consistently outperforms state-of-the-art defenses across four mainstream LLMs and five representative fraud types, while also offering interpretable clues for the model's generations.


Key Contributions

  • Constructs and refines a fraud tactic-keyword knowledge graph linking suspicious text patterns to known fraud techniques
  • Augments LLM inputs with highlighted fraud keywords and supporting evidence from the knowledge graph to guide secure responses
  • Evaluates across 4 mainstream LLMs and 5 fraud types (fraudulent service, impersonation, phishing, fake job postings, online relationship scams) in both normal and role-play settings

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_time
Datasets
MMLU
Applications
contract reviewjob application processingautomated llm workflows