attack 2025

LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

Ting Li 1, Yang Yang 1, Yipeng Yu 2, Liang Yao 1, Guoqing Chao 3, Ruifeng Xu 3

0 citations · 39 references · arXiv

α

Published on arXiv

2510.11584

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

LLMAtKGE outperforms the strongest black-box baselines on two KG benchmarks and achieves competitive performance against white-box methods while producing human-readable attack explanations.

LLMAtKGE

Novel technique introduced


Adversarial attacks on knowledge graph embeddings (KGE) aim to disrupt the model's ability of link prediction by removing or inserting triples. A recent black-box method has attempted to incorporate textual and structural information to enhance attack performance. However, it is unable to generate human-readable explanations, and exhibits poor generalizability. In the past few years, large language models (LLMs) have demonstrated powerful capabilities in text comprehension, generation, and reasoning. In this paper, we propose LLMAtKGE, a novel LLM-based framework that selects attack targets and generates human-readable explanations. To provide the LLM with sufficient factual context under limited input constraints, we design a structured prompting scheme that explicitly formulates the attack as multiple-choice questions while incorporating KG factual evidence. To address the context-window limitation and hesitation issues, we introduce semantics-based and centrality-based filters, which compress the candidate set while preserving high recall of attack-relevant information. Furthermore, to efficiently integrate both semantic and structural information into the filter, we precompute high-order adjacency and fine-tune the LLM with a triple classification task to enhance filtering performance. Experiments on two widely used knowledge graph datasets demonstrate that our attack outperforms the strongest black-box baselines and provides explanations via reasoning, and showing competitive performance compared with white-box methods. Comprehensive ablation and case studies further validate its capability to generate explanations.


Key Contributions

  • LLMAtKGE: an LLM-based black-box poisoning framework that selects adversarial triples for KGE attacks and generates human-readable explanations via chain-of-thought reasoning
  • Structured prompting scheme that reformulates the attack as multiple-choice questions with KG factual evidence to fit LLM context constraints
  • Semantics-based and centrality-based candidate filters using high-order adjacency precomputation and fine-tuned LLM triple classification to compress candidate sets while preserving attack-relevant recall

🛡️ Threat Analysis

Data Poisoning Attack

The attack works by inserting or removing triples from the knowledge graph training set to degrade the KGE model's link prediction on specific target triples — this is targeted data poisoning at training time, the core ML02 threat.


Details

Domains
graphnlp
Model Types
llmgnn
Threat Tags
black_boxtraining_timetargeted
Datasets
FB15k-237WN18RR
Applications
knowledge graph completionlink prediction