Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning
Ajinkya Mohgaonkar 1, Lukas Gosch 1,2,3, Mahalakshmi Sabanayagam 1,4, Debarghya Ghoshdastidar 1, Stephan Günnemann 1,2,3
Published on arXiv
2604.11416
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
Certifies up to 26.5% more label flips in median on CIFAR-10 compared to existing black-box approaches while requiring 100 times fewer partitions
EnsembleCert
Novel technique introduced
Label-flipping attacks, which corrupt training labels to induce misclassifications at inference, remain a major threat to supervised learning models. This drives the need for robustness certificates that provide formal guarantees about a model's robustness under adversarially corrupted labels. Existing certification frameworks rely on ensemble techniques such as smoothing or partition-aggregation, but treat the corresponding base classifiers as black boxes, yielding overly conservative guarantees. We introduce EnsembleCert, the first certification framework for partition-aggregation ensembles that utilizes white-box knowledge of the base classifiers. Concretely, EnsembleCert yields tighter guarantees than black-box approaches by aggregating per-partition white-box certificates to compute ensemble-level guarantees in polynomial time. To extract white-box knowledge from the base classifiers efficiently, we develop ScaLabelCert, a method that leverages the equivalence between sufficiently wide neural networks and kernel methods using the neural tangent kernel. ScaLabelCert yields the first exact, polynomial-time calculable certificate for neural networks against label-flipping attacks. EnsembleCert is either on par, or significantly outperforms the existing partition-based black box certificates. Exemplary, on CIFAR-10, our method can certify upto +26.5% more label flips in median over the test set compared to the existing black-box approach while requiring 100 times fewer partitions, thus, challenging the prevailing notion that heavy partitioning is a necessity for strong certified robustness.
Key Contributions
- EnsembleCert: first white-box certification framework for partition-aggregation ensembles against label poisoning
- ScaLabelCert: first exact, polynomial-time certificate for neural networks against label-flipping using neural tangent kernels
- Achieves +26.5% stronger median certification on CIFAR-10 with 100x fewer partitions than black-box baselines
🛡️ Threat Analysis
Label-flipping attacks are a classic form of data poisoning where an adversary corrupts training labels to degrade model performance. The paper proposes certification (defense) methods that provide formal robustness guarantees against such poisoning attacks.