On Robustness of Linear Classifiers to Targeted Data Poisoning

Data poisoning is a training-time attack that undermines the trustworthiness of learned models. In a targeted data poisoning attack, an adversary manipulates the training dataset to alter the classification of a targeted test point. Given the typically large size of training dataset, manual detection of poisoning is difficult. An alternative is to automatically measure a dataset's robustness against such an attack, which is the focus of this paper. We consider a threat model wherein an adversary can only perturb the labels of the training dataset, with knowledge limited to the hypothesis space of the victim's model. In this setting, we prove that finding the robustness is an NP-Complete problem, even when hypotheses are linear classifiers. To overcome this, we present a technique that finds lower and upper bounds of robustness. Our implementation of the technique computes these bounds efficiently in practice for many publicly available datasets. We experimentally demonstrate the effectiveness of our approach. Specifically, a poisoning exceeding the identified robustness bounds significantly impacts test point classification. We are also able to compute these bounds in many more cases where state-of-the-art techniques fail.

Key Contributions

Proves that computing exact robustness against targeted label-flipping poisoning is NP-Complete even for linear classifiers
Presents a practical technique computing lower and upper bounds on dataset robustness against targeted data poisoning
Empirically demonstrates that poisoning beyond the identified bounds reliably flips the targeted test point's classification

🛡️ Threat Analysis

Data Poisoning Attack

Directly addresses targeted data poisoning via label-flipping attacks at training time, providing formal robustness bounds against an adversary who manipulates training labels to alter classification of a specific test point.

Details

Domains

tabular

Model Types

traditional_ml

Threat Tags

grey_boxtraining_timetargeted

Applications

2025 1 cit.

Output Integrity AttackData Poisoning Attack

60%

On Robustness of Linear Classifiers to Targeted Data Poisoning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles

Stealthy Poisoning Attacks Bypass Defenses in Regression Settings

IndirectAD: Practical Data Poisoning Attacks against Recommender Systems for Item Promotion

Mathematical Foundations of Poisoning Attacks on Linear Regression over Cumulative Distribution Functions

ANML: Attribution-Native Machine Learning with Guaranteed Robustness

Fairness-Constrained Optimization Attack in Federated Learning

Adversarial Bias: Data Poisoning Attacks on Fairness

Explainable but Vulnerable: Adversarial Attacks on XAI Explanation in Cybersecurity Applications