An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline

Machine learning (ML) models have the potential to transform military battlefields, presenting a large external pressure to rapidly incorporate them into operational settings. However, it is well-established that these ML models are vulnerable to a number of adversarial attacks throughout the model deployment pipeline that threaten to negate battlefield advantage. One broad category is privacy attacks (such as model inversion) where an adversary can reverse engineer information from the model, such as the sensitive data used in its training. The ability to quantify the risk of model inversion attacks (MIAs) is not well studied, and there is a lack of automated developmental test and evaluation (DT&E) tools and metrics to quantify the effectiveness of privacy loss of the MIA. The current DT&E process is difficult because ML model inversions can be hard for a human to interpret, subjective when they are interpretable, and difficult to quantify in terms of inversion quality. Additionally, scaling the DT&E process is challenging due to many ML model architectures and data modalities that need to be assessed. In this work, we present a novel DT&E tool that quantifies the risk of data privacy loss from MIAs and introduces four adversarial risk dimensions to quantify privacy loss. Our DT&E pipeline combines inversion with vision language models (VLMs) to improve effectiveness while enabling scalable analysis. We demonstrate effectiveness using multiple MIA techniques and VLMs configured for zero-shot classification and image captioning. We benchmark the pipeline using several state-of-the-art MIAs in the computer vision domain with an image classification task that is typical in military applications. In general, our innovative pipeline extends the current model inversion DT&E capabilities by improving the effectiveness and scalability of the privacy loss analysis in an automated fashion.

Key Contributions

Automated DT&E pipeline that combines state-of-the-art model inversion attacks with VLM-based interpretation to quantify training data privacy loss at scale
Four adversarial risk dimensions as quantitative metrics for measuring the effectiveness and severity of model inversion privacy loss
Use of VLMs (zero-shot classification and image captioning) to eliminate subjective human interpretation of inverted images, enabling scalable automated analysis

🛡️ Threat Analysis

Model Inversion Attack

The paper's primary contribution is an assessment pipeline for model inversion attacks (MIAs), where an adversary reverse-engineers private training data from a deployed model. It benchmarks white-box and black-box MIA techniques and introduces metrics to quantify the degree of training data reconstruction — directly the ML03 threat model.