Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming
Philip Sosnin 1, Jodie Knapp 2, Fraser Kennedy 2, Josh Collyer 2, Calvin Tsay 1
Published on arXiv
2602.16944
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Provides the first provably sound and complete certification of training-time robustness against data poisoning, simultaneously computing optimal worst-case attacks and tight robustness bounds for small linear models.
MIQCP-based exact poisoning certification
Novel technique introduced
This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer quadratic programming (MIQCP) problem. Finding the global optimum of the proposed formulation provably yields worst-case poisoning attacks, while simultaneously bounding the effectiveness of all possible attacks on the given training pipeline. Our framework encodes both the gradient-based training dynamics and model evaluation at test time, enabling the first exact certification of training-time robustness. Experimental evaluation on small models confirms that our approach delivers a complete characterization of robustness against data poisoning.
Key Contributions
- First sound and complete (exact) verification framework for data poisoning attacks, formulated as a single Mixed-Integer Quadratic Constrained Program (MIQCP)
- Joint encoding of adversarial data manipulation, gradient-based training dynamics, and test-time evaluation in one optimization problem
- Tailored MIQCP solution strategies including reformulations, heuristics, and bound tightening; empirically validated on small linear models
🛡️ Threat Analysis
The paper directly targets data poisoning attacks — it formulates worst-case data manipulation of training sets as an MIQCP and certifies training-time robustness bounds. The threat model is adversarial injection of crafted samples into training data to manipulate model behavior.