Analyzing Physical Adversarial Example Threats to Machine Learning in Election Systems

Developments in the machine learning voting domain have shown both promising results and risks. Trained models perform well on ballot classification tasks (> 99% accuracy) but are at risk from adversarial example attacks that cause misclassifications. In this paper, we analyze an attacker who seeks to deploy adversarial examples against machine learning ballot classifiers to compromise a U.S. election. We first derive a probabilistic framework for determining the number of adversarial example ballots that must be printed to flip an election, in terms of the probability of each candidate winning and the total number of ballots cast. Second, it is an open question as to which type of adversarial example is most effective when physically printed in the voting domain. We analyze six different types of adversarial example attacks: l_infinity-APGD, l2-APGD, l1-APGD, l0 PGD, l0 + l_infinity PGD, and l0 + sigma-map PGD. Our experiments include physical realizations of 144,000 adversarial examples through printing and scanning with four different machine learning models. We empirically demonstrate an analysis gap between the physical and digital domains, wherein attacks most effective in the digital domain (l2 and l_infinity) differ from those most effective in the physical domain (l1 and l2, depending on the model). By unifying a probabilistic election framework with digital and physical adversarial example evaluations, we move beyond prior close race analyses to explicitly quantify when and how adversarial ballot manipulation could alter outcomes.

Key Contributions

Probabilistic framework deriving how many adversarial ballots must be printed to flip a U.S. election as a function of candidate win probability and total ballot count
Large-scale physical evaluation of 144,000 adversarial examples across 6 attack norms and 4 ML models via print-and-scan, revealing a digital-physical analysis gap
Empirical finding that l1-APGD dominates in the physical domain while l2/l_inf attacks lead digitally — showing digital-domain attack rankings do not transfer to physical deployment

🛡️ Threat Analysis

Input Manipulation Attack

Evaluates six gradient-based adversarial example attacks (l_inf/l2/l1/l0 APGD and PGD variants) causing misclassification of ballot images at inference time, with physical realization via print-and-scan.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxinference_timeuntargetedphysicaldigital

Datasets

custom ballot dataset (144,000 printed/scanned adversarial examples)

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

Defense That Attacks: How Robust Models Become Better Attackers

Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks

How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness

Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers

Verifying Local Robustness of Pruned Safety-Critical Networks

Localizing Adversarial Attacks To Produces More Imperceptible Noise

When Flatness Does (Not) Guarantee Adversarial Robustness