benchmark 2025

Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning

William Xu 1, Yiwei Lu 2, Yihan Wang 3, Matthew Y.R. Yang 1, Zuoqiu Liu 1,4, Gautam Kamath 5, Yaoliang Yu 1,4

0 citations

α

Published on arXiv

2509.06896

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Three instance-level metrics — ergodic prediction accuracy, poison distance, and poison budget — reliably predict the difficulty of targeted data poisoning attacks across diverse real-world scenarios.

Ergodic Prediction Accuracy (CPA)

Novel technique introduced


Targeted data poisoning attacks pose an increasingly serious threat due to their ease of deployment and high success rates. These attacks aim to manipulate the prediction for a single test sample in classification models. Unlike indiscriminate attacks that aim to decrease overall test performance, targeted attacks present a unique threat to individual test instances. This threat model raises a fundamental question: what factors make certain test samples more susceptible to successful poisoning than others? We investigate how attack difficulty varies across different test instances and identify key characteristics that influence vulnerability. This paper introduces three predictive criteria for targeted data poisoning difficulty: ergodic prediction accuracy (analyzed through clean training dynamics), poison distance, and poison budget. Our experimental results demonstrate that these metrics effectively predict the varying difficulty of real-world targeted poisoning attacks across diverse scenarios, offering practitioners valuable insights for vulnerability assessment and understanding data poisoning attacks.


Key Contributions

  • Introduces ergodic prediction accuracy (analyzed via clean training dynamics) as a predictive criterion for targeted poisoning difficulty
  • Proposes two additional criteria — poison distance and poison budget — to quantify instance-level susceptibility to data poisoning
  • Demonstrates that these three metrics effectively predict real-world targeted poisoning attack success across diverse classification scenarios

🛡️ Threat Analysis

Data Poisoning Attack

Entirely focused on targeted data poisoning attacks, analyzing how attack success varies across individual test instances and proposing criteria that predict poisoning difficulty across diverse scenarios.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timetargeted
Datasets
CIFAR-10CIFAR-100
Applications
image classification