attack 2026

StealthGraph: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

Huawei Zheng , Xinqi Jiang , Sen Yang , Shouling Ji , Yingcai Wu , Dazhen Deng

1 citations · 44 references · arXiv

α

Published on arXiv

2601.04740

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Framework produces implicit domain-specific harmful prompts that more reliably bypass modern LLM defenses compared to explicit harmful prompts used in existing public datasets.

StealthGraph

Novel technique introduced


Large language models (LLMs) are increasingly applied in specialized domains such as finance and healthcare, where they introduce unique safety risks. Domain-specific datasets of harmful prompts remain scarce and still largely rely on manual construction; public datasets mainly focus on explicit harmful prompts, which modern LLM defenses can often detect and refuse. In contrast, implicit harmful prompts-expressed through indirect domain knowledge-are harder to detect and better reflect real-world threats. We identify two challenges: transforming domain knowledge into actionable constraints and increasing the implicitness of generated harmful prompts. To address them, we propose an end-to-end framework that first performs knowledge-graph-guided harmful prompt generation to systematically produce domain-relevant prompts, and then applies dual-path obfuscation rewriting to convert explicit harmful prompts into implicit variants via direct and context-enhanced rewriting. This framework yields high-quality datasets combining strong domain relevance with implicitness, enabling more realistic red-teaming and advancing LLM safety research. We release our code and datasets at GitHub.


Key Contributions

  • Knowledge-graph-guided harmful prompt generation framework that transforms domain knowledge (Wikidata) into actionable constraints for producing domain-relevant harmful prompts in finance and healthcare
  • Dual-path obfuscation rewriting pipeline (direct and context-enhanced) that converts explicit harmful prompts into implicit variants harder to detect by modern LLM defenses
  • Released domain-specific red-teaming datasets combining domain relevance with implicitness for advancing LLM safety evaluation

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
Wikidatacustom StealthGraph-generated datasets
Applications
llm red-teamingdomain-specific llms in financedomain-specific llms in healthcarellm safety evaluation