attack 2025

Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents

Hanlin Cai 1, Houtianfu Wang 1, Haofan Dong 1, Kai Li 1,2, Sai Zou 3, Ozgur B. Akan 1,4

1 citations · 26 references · arXiv

α

Published on arXiv

2511.07176

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

GRMP substantially degrades global LLM accuracy while remaining statistically indistinguishable from benign updates, successfully bypassing Krum and FoolsGold defenses.

GRMP (Graph Representation-based Model Poisoning)

Novel technique introduced


Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale. Within this paradigm, federated fine-tuning (FFT) serves as a key enabler that allows distributed LLM agents to co-train an intelligent global LLM without centralizing local datasets. However, the FFT-enabled IoA systems remain vulnerable to model poisoning attacks, where adversaries can upload malicious updates to the server to degrade the performance of the aggregated global LLM. This paper proposes a graph representation-based model poisoning (GRMP) attack, which exploits overheard benign updates to construct a feature correlation graph and employs a variational graph autoencoder to capture structural dependencies and generate malicious updates. A novel attack algorithm is developed based on augmented Lagrangian and subgradient descent methods to optimize malicious updates that preserve benign-like statistics while embedding adversarial objectives. Experimental results show that the proposed GRMP attack can substantially decrease accuracy across different LLM models while remaining statistically consistent with benign updates, thereby evading detection by existing defense mechanisms and underscoring a severe threat to the ambitious IoA paradigm.


Key Contributions

  • GRMP attack leverages overheard benign LoRA updates to construct a feature correlation graph, then uses a variational graph autoencoder to generate malicious updates that statistically mimic benign updates
  • Novel optimization algorithm based on augmented Lagrangian and subgradient descent to embed adversarial degradation objectives while preserving geometric coherence with benign updates
  • Demonstrated evasion of DiSim-based defenses (Krum, FoolsGold) while substantially reducing accuracy of the aggregated global LLM across multiple model architectures

🛡️ Threat Analysis

Data Poisoning Attack

Core contribution is a Byzantine attack in federated learning where the adversary uploads malicious model updates (not data) to degrade the aggregated global LLM's performance — precisely the 'malicious clients sending arbitrary model updates to degrade global model performance' case listed under ML02. The attack is aimed at general performance degradation, not a hidden trigger-based backdoor (ML10).


Details

Domains
federated-learningnlp
Model Types
llmfederatedgnn
Threat Tags
grey_boxtraining_time
Applications
federated fine-tuning of llmsinternet of agents systems