attack 2025

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

2 citations · 1 influential · 45 references · arXiv

Published on arXiv

2509.23041

Data Poisoning Attack

OWASP ML Top 10 — ML02

Model Poisoning

OWASP ML Top 10 — ML10

Training Data Poisoning

OWASP LLM Top 10 — LLM03

Key Finding

VIA raises the attack success rate on downstream LLMs trained on synthetic data to levels comparable to those observed when directly poisoning the upstream model, succeeding where prior attacks fail due to distributional mismatch.

VIA (Virus Infection Attack)

Novel technique introduced

Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective "shell" and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.

Key Contributions

Reveals that existing poisoning/backdoor attacks largely fail against synthetic-data-integrated LLM training due to distributional mismatch between poisoning data and clean generation queries.
Proposes VIA (Virus Infection Attack), a universal framework that embeds poisoning payloads inside protective 'shells' within benign samples, enabling propagation through synthetic data even under purely clean queries.
Introduces a hijacking point search strategy (HPS) to identify optimal positions in benign samples for payload injection, maximizing malicious content generation in synthetic outputs.

🛡️ Threat Analysis

Data Poisoning Attack

VIA is fundamentally a training-data attack — it corrupts the synthetic data used to train downstream LLMs by injecting poisoning content into benign samples, causing biased or harmful model behavior.

Model Poisoning

The paper explicitly covers backdoor attacks as a primary scenario alongside data poisoning: VIA embeds triggered backdoor payloads that propagate through synthetic data, achieving high attack success rates on downstream models.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timeblack_box

Datasets

Tulu-3

Applications

llm training pipelinessynthetic data generation

Read PDF arXiv DOI

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks

The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

On The Dangers of Poisoned LLMs In Security Automation

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework