Quantifying the Risk of Transferred Black Box Attacks

Neural networks have become pervasive across various applications, including security-related products. However, their widespread adoption has heightened concerns regarding vulnerability to adversarial attacks. With emerging regulations and standards emphasizing security, organizations must reliably quantify risks associated with these attacks, particularly regarding transferred adversarial attacks, which remain challenging to evaluate accurately. This paper investigates the complexities involved in resilience testing against transferred adversarial attacks. Our analysis specifically addresses black-box evasion attacks, highlighting transfer-based attacks due to their practical significance and typically high transferability between neural network models. We underline the computational infeasibility of exhaustively exploring high-dimensional input spaces to achieve complete test coverage. As a result, comprehensive adversarial risk mapping is deemed impractical. To mitigate this limitation, we propose a targeted resilience testing framework that employs surrogate models strategically selected based on Centered Kernel Alignment (CKA) similarity. By leveraging surrogate models exhibiting both high and low CKA similarities relative to the target model, the proposed approach seeks to optimize coverage of adversarial subspaces. Risk estimation is conducted using regression-based estimators, providing organizations with realistic and actionable risk quantification.

Key Contributions

Demonstrates computational infeasibility of exhaustive adversarial coverage testing in high-dimensional input spaces, making comprehensive adversarial risk mapping impractical
Proposes strategically selecting surrogate models based on Centered Kernel Alignment (CKA) similarity — using both high and low CKA surrogates — to optimize coverage of adversarial subspaces
Introduces regression-based risk estimators to provide organizations with actionable, realistic quantification of transferred adversarial attack risk

🛡️ Threat Analysis

Input Manipulation Attack

The paper's core subject is transfer-based black-box evasion attacks — adversarial inputs crafted on surrogate models that transfer to target models at inference time. The proposed framework directly evaluates and quantifies resilience against these input manipulation attacks.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxinference_time

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

The Impact of Scaling Training Data on Adversarial Robustness

Learning-Based Testing for Deep Learning: Enhancing Model Robustness with Adversarial Input Prioritization

On the Trade-Off Between Transparency and Security in Adversarial Machine Learning

Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability

OTI: A Model-free and Visually Interpretable Measure of Image Attackability

Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness

Formal Reasoning About Confidence and Automated Verification of Neural Networks

Solving adversarial examples requires solving exponential misalignment