Bridging Efficiency and Safety: Formal Verification of Neural Networks with Early Exits
Yizhak Yisrael Elboher, Avraham Raviv, Amihay Elboher et al. · The Hebrew University of Jerusalem · Bar Ilan University +2 more
Yizhak Yisrael Elboher, Avraham Raviv, Amihay Elboher et al. · The Hebrew University of Jerusalem · Bar Ilan University +2 more
Formal verification framework for early exit neural networks that certifies local robustness and improves verification efficiency
Ensuring the safety and efficiency of AI systems is a central goal of modern research. Formal verification provides guarantees of neural network robustness, while early exits improve inference efficiency by enabling intermediate predictions. Yet verifying networks with early exits introduces new challenges due to their conditional execution paths. In this work, we define a robustness property tailored to early exit architectures and show how off-the-shelf solvers can be used to assess it. We present a baseline algorithm, enhanced with an early stopping strategy and heuristic optimizations that maintain soundness and completeness. Experiments on multiple benchmarks validate our framework's effectiveness and demonstrate the performance gains of the improved algorithm. Alongside the natural inference acceleration provided by early exits, we show that they also enhance verifiability, enabling more queries to be solved in less time compared to standard networks. Together with a robustness analysis, we show how these metrics can help users navigate the inherent trade-off between accuracy and efficiency.
Avital Shafran, Roei Schuster, Thomas Ristenpart et al. · The Hebrew University of Jerusalem · Wild Moose +1 more
Adversarially optimized token sequences (confounder gadgets) reliably manipulate LLM routers into routing any query to expensive models, evading perplexity defenses
LLM routers aim to balance quality and cost of generation by classifying queries and routing them to a cheaper or more expensive LLM depending on their complexity. Routers represent one type of what we call LLM control planes: systems that orchestrate use of one or more LLMs. In this paper, we investigate routers' adversarial robustness. We first define LLM control plane integrity, i.e., robustness of LLM orchestration to adversarial inputs, as a distinct problem in AI safety. Next, we demonstrate that an adversary can generate query-independent token sequences we call ``confounder gadgets'' that, when added to any query, cause LLM routers to send the query to a strong LLM. Our quantitative evaluation shows that this attack is successful both in white-box and black-box settings against a variety of open-source and commercial routers, and that confounding queries do not affect the quality of LLM responses. Finally, we demonstrate that gadgets can be effective while maintaining low perplexity, thus perplexity-based filtering is not an effective defense. We finish by investigating alternative defenses.