attack 2026

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

0 citations · 34 references · arXiv

Published on arXiv

2601.00566

Data Poisoning Attack

OWASP ML Top 10 — ML02

Transfer Learning Attack

OWASP ML Top 10 — ML07

Key Finding

GAP reduces BLEU by up to 14.5% and increases factual/grammatical errors by over 800% on federated LoRA-tuned LLMs while maintaining surface fluency and evading standard anomaly detectors.

Gradient Assembly Poisoning (GAP)

Novel technique introduced

Low-Rank Adaptation (LoRA) has become a popular solution for fine-tuning large language models (LLMs) in federated settings, dramatically reducing update costs by introducing trainable low-rank matrices. However, when integrated with frameworks like FedIT, LoRA introduces a critical vulnerability: clients submit $A$ and $B$ matrices separately, while only their product $AB$ determines the model update, yet this composite is never directly verified. We propose Gradient Assembly Poisoning (GAP), a novel attack that exploits this blind spot by crafting individually benign $A$ and $B$ matrices whose product yields malicious updates. GAP operates without access to training data or inter-client coordination and remains undetected by standard anomaly detectors. We identify four systemic vulnerabilities in LoRA-based federated systems and validate GAP across LLaMA, ChatGLM, and GPT-2. GAP consistently induces degraded or biased outputs while preserving surface fluency, reducing BLEU by up to 14.5\%, increasing factual and grammatical errors by over 800\%, and maintaining 92.6\% long-form response length. These results reveal a new class of stealthy, persistent threats in distributed LoRA fine-tuning.

Key Contributions

Identifies four systemic vulnerabilities in LoRA-based federated fine-tuning systems (verification gaps, layer-wise isolation, bias accumulation, parameter-behavior mismatch)
Proposes GAP, a constrained optimization attack that crafts individually benign A and B matrices whose product injects malicious composite updates undetectable by standard anomaly filters
Demonstrates GAP across LLaMA, ChatGLM, and GPT-2, reducing BLEU by up to 14.5% and increasing factual/grammatical errors by over 800% without access to training data or inter-client coordination

🛡️ Threat Analysis

Data Poisoning Attack

GAP is a Byzantine federated learning attack where malicious clients craft and submit harmful model updates (poisoned A and B matrices) to degrade the global LLM's performance — this directly maps to the Byzantine FL poisoning threat in ML02.

Transfer Learning Attack

The attack explicitly exploits the LoRA adapter fine-tuning architecture and the FedIT framework's decoupled aggregation mechanism — ML07 explicitly includes 'Adapter/LoRA trojans' and attacks that exploit the gap between pre-training and fine-tuning-time update verification.

Details

Domains

nlpfederated-learning

Model Types

llmfederatedtransformer

Threat Tags

white_boxtraining_time

Datasets

LLaMAChatGLMGPT-2

Applications

federated llm fine-tuninginstruction-following modelsdistributed lora adaptation

Read PDF arXiv DOI

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment

Subliminal Signals in Preference Labels

SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training

On the Fragility of Contribution Score Computation in Federated Learning

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler