attack 2026

Quality Degradation Attack in Synthetic Data

Qinyi Liu 1, Dong Liu 2, Farhad Vadiee 1, Mohammad Khalil 1, Pedro P. Vergara Barrios 2

0 citations · 20 references · arXiv

α

Published on arXiv

2601.02947

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Small adversarial perturbations to real training data (label flipping, feature interventions) substantially reduce downstream predictive performance and increase statistical divergence of generated synthetic data.

Quality Degradation Attack

Novel technique introduced


Synthetic Data Generation (SDG) can be used to facilitate privacy-preserving data sharing. However, most existing research focuses on privacy attacks where the adversary is the recipient of the released synthetic data and attempts to infer sensitive information from it. This study investigates quality degradation attacks initiated by adversaries who possess access to the real dataset or control over the generation process, such as the data owner, the synthetic data provider, or potential intruders. We formalize a corresponding threat model and empirically evaluate the effectiveness of targeted manipulations of real data (e.g., label flipping and feature-importance-based interventions) on the quality of generated synthetic data. The results show that even small perturbations can substantially reduce downstream predictive performance and increase statistical divergence, exposing vulnerabilities within SDG pipelines. This study highlights the need to integrate integrity verification and robustness mechanisms, alongside privacy protection, to ensure the reliability and trustworthiness of synthetic data sharing frameworks.


Key Contributions

  • Formalizes a threat model for quality degradation attacks on synthetic data generation pipelines, covering data owner, provider, and intruder adversary roles
  • Empirically evaluates targeted data manipulations (label flipping, feature-importance-based interventions) on SDG quality metrics including downstream predictive performance and statistical divergence
  • Demonstrates that even small perturbations substantially degrade synthetic data quality, exposing previously underexplored vulnerabilities in privacy-preserving data sharing frameworks

🛡️ Threat Analysis

Data Poisoning Attack

The paper directly implements data poisoning attacks — label flipping and feature-importance-based interventions — applied to the real dataset before or during the synthetic data generation process, degrading the quality of the resulting synthetic data and downstream model performance. This is the core ML02 threat: corrupting training data to degrade model behavior.


Details

Domains
tabulargenerative
Model Types
generativetraditional_ml
Threat Tags
white_boxtraining_timeuntargeted
Applications
synthetic data generationprivacy-preserving data sharing