attack 2025

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

Zi Liang 1, Qingqing Ye 1, Xuan Liu 2, Yanyun Wang 3, Jianliang Xu 4, Haibo Hu 1

2 citations · 1 influential · 45 references · arXiv

α

Published on arXiv

2509.23041

Data Poisoning Attack

OWASP ML Top 10 — ML02

Model Poisoning

OWASP ML Top 10 — ML10

Training Data Poisoning

OWASP LLM Top 10 — LLM03

Key Finding

VIA raises the attack success rate on downstream LLMs trained on synthetic data to levels comparable to those observed when directly poisoning the upstream model, succeeding where prior attacks fail due to distributional mismatch.

VIA (Virus Infection Attack)

Novel technique introduced


Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective "shell" and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.


Key Contributions

  • Reveals that existing poisoning/backdoor attacks largely fail against synthetic-data-integrated LLM training due to distributional mismatch between poisoning data and clean generation queries.
  • Proposes VIA (Virus Infection Attack), a universal framework that embeds poisoning payloads inside protective 'shells' within benign samples, enabling propagation through synthetic data even under purely clean queries.
  • Introduces a hijacking point search strategy (HPS) to identify optimal positions in benign samples for payload injection, maximizing malicious content generation in synthetic outputs.

🛡️ Threat Analysis

Data Poisoning Attack

VIA is fundamentally a training-data attack — it corrupts the synthetic data used to train downstream LLMs by injecting poisoning content into benign samples, causing biased or harmful model behavior.

Model Poisoning

The paper explicitly covers backdoor attacks as a primary scenario alongside data poisoning: VIA embeds triggered backdoor payloads that propagate through synthetic data, achieving high attack success rates on downstream models.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
training_timeblack_box
Datasets
Tulu-3
Applications
llm training pipelinessynthetic data generation