defense 2026

Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming

Philip Sosnin 1, Jodie Knapp 2, Fraser Kennedy 2, Josh Collyer 2, Calvin Tsay 1

0 citations · 49 references · arXiv (Cornell University)

α

Published on arXiv

2602.16944

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Provides the first provably sound and complete certification of training-time robustness against data poisoning, simultaneously computing optimal worst-case attacks and tight robustness bounds for small linear models.

MIQCP-based exact poisoning certification

Novel technique introduced


This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer quadratic programming (MIQCP) problem. Finding the global optimum of the proposed formulation provably yields worst-case poisoning attacks, while simultaneously bounding the effectiveness of all possible attacks on the given training pipeline. Our framework encodes both the gradient-based training dynamics and model evaluation at test time, enabling the first exact certification of training-time robustness. Experimental evaluation on small models confirms that our approach delivers a complete characterization of robustness against data poisoning.


Key Contributions

  • First sound and complete (exact) verification framework for data poisoning attacks, formulated as a single Mixed-Integer Quadratic Constrained Program (MIQCP)
  • Joint encoding of adversarial data manipulation, gradient-based training dynamics, and test-time evaluation in one optimization problem
  • Tailored MIQCP solution strategies including reformulations, heuristics, and bound tightening; empirically validated on small linear models

🛡️ Threat Analysis

Data Poisoning Attack

The paper directly targets data poisoning attacks — it formulates worst-case data manipulation of training sets as an MIQCP and certifies training-time robustness bounds. The threat model is adversarial injection of crafted samples into training data to manipulate model behavior.


Details

Model Types
traditional_ml
Threat Tags
white_boxtraining_time
Applications
neural network trainingimage classification