Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents

Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale. Within this paradigm, federated fine-tuning (FFT) serves as a key enabler that allows distributed LLM agents to co-train an intelligent global LLM without centralizing local datasets. However, the FFT-enabled IoA systems remain vulnerable to model poisoning attacks, where adversaries can upload malicious updates to the server to degrade the performance of the aggregated global LLM. This paper proposes a graph representation-based model poisoning (GRMP) attack, which exploits overheard benign updates to construct a feature correlation graph and employs a variational graph autoencoder to capture structural dependencies and generate malicious updates. A novel attack algorithm is developed based on augmented Lagrangian and subgradient descent methods to optimize malicious updates that preserve benign-like statistics while embedding adversarial objectives. Experimental results show that the proposed GRMP attack can substantially decrease accuracy across different LLM models while remaining statistically consistent with benign updates, thereby evading detection by existing defense mechanisms and underscoring a severe threat to the ambitious IoA paradigm.

Key Contributions

GRMP attack leverages overheard benign LoRA updates to construct a feature correlation graph, then uses a variational graph autoencoder to generate malicious updates that statistically mimic benign updates
Novel optimization algorithm based on augmented Lagrangian and subgradient descent to embed adversarial degradation objectives while preserving geometric coherence with benign updates
Demonstrated evasion of DiSim-based defenses (Krum, FoolsGold) while substantially reducing accuracy of the aggregated global LLM across multiple model architectures

🛡️ Threat Analysis

Data Poisoning Attack

Core contribution is a Byzantine attack in federated learning where the adversary uploads malicious model updates (not data) to degrade the aggregated global LLM's performance — precisely the 'malicious clients sending arbitrary model updates to degrade global model performance' case listed under ML02. The attack is aimed at general performance degradation, not a hidden trigger-based backdoor (ML10).