Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming

This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer quadratic programming (MIQCP) problem. Finding the global optimum of the proposed formulation provably yields worst-case poisoning attacks, while simultaneously bounding the effectiveness of all possible attacks on the given training pipeline. Our framework encodes both the gradient-based training dynamics and model evaluation at test time, enabling the first exact certification of training-time robustness. Experimental evaluation on small models confirms that our approach delivers a complete characterization of robustness against data poisoning.

Key Contributions

First sound and complete (exact) verification framework for data poisoning attacks, formulated as a single Mixed-Integer Quadratic Constrained Program (MIQCP)
Joint encoding of adversarial data manipulation, gradient-based training dynamics, and test-time evaluation in one optimization problem
Tailored MIQCP solution strategies including reformulations, heuristics, and bound tightening; empirically validated on small linear models

🛡️ Threat Analysis

Data Poisoning Attack

The paper directly targets data poisoning attacks — it formulates worst-case data manipulation of training sets as an MIQCP and certifies training-time robustness bounds. The threat model is adversarial injection of crafted samples into training data to manipulate model behavior.