defense 2026

FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks

Naen Xu ¹, Jinghuai Zhang ², Ping He ¹, Chunyi Zhou ¹, Jun Wang ³, Zhihui Fu ³, Tianyu Du ¹, Zhaoxiang Wang ³, Shouling Ji ¹

¹ Zhejiang University

² University of California, Los Angeles

³ OPPO Research Institute

0 citations · 45 references · arXiv

Published on arXiv

2601.22485

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

FraudShield achieves 89.45% average defense success rate across 5 fraud types and 4 LLMs while maintaining 70.60% MMLU accuracy.

FraudShield

Novel technique introduced

Large language models (LLMs) have been widely integrated into critical automated workflows, including contract review and job application processes. However, LLMs are susceptible to manipulation by fraudulent information, which can lead to harmful outcomes. Although advanced defense methods have been developed to address this issue, they often exhibit limitations in effectiveness, interpretability, and generalizability, particularly when applied to LLM-based applications. To address these challenges, we introduce FraudShield, a novel framework designed to protect LLMs from fraudulent content by leveraging a comprehensive analysis of fraud tactics. Specifically, FraudShield constructs and refines a fraud tactic-keyword knowledge graph to capture high-confidence associations between suspicious text and fraud techniques. The structured knowledge graph augments the original input by highlighting keywords and providing supporting evidence, guiding the LLM toward more secure responses. Extensive experiments show that FraudShield consistently outperforms state-of-the-art defenses across four mainstream LLMs and five representative fraud types, while also offering interpretable clues for the model's generations.

Key Contributions

Constructs and refines a fraud tactic-keyword knowledge graph linking suspicious text patterns to known fraud techniques
Augments LLM inputs with highlighted fraud keywords and supporting evidence from the knowledge graph to guide secure responses
Evaluates across 4 mainstream LLMs and 5 fraud types (fraudulent service, impersonation, phishing, fake job postings, online relationship scams) in both normal and role-play settings

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Datasets

MMLU

Applications

contract reviewjob application processingautomated llm workflows

Read PDF arXiv DOI

FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Invasive Context Engineering to Control Large Language Models

Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT

LLM Reinforcement in Context

Soft Instruction De-escalation Defense

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

$C$-$ΔΘ$: Circuit-Restricted Weight Arithmetic for Selective Refusal

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems