Identifying Adversary Characteristics from an Observed Attack

When used in automated decision-making systems, machine learning (ML) models are vulnerable to data-manipulation attacks. Some defense mechanisms (e.g., adversarial regularization) directly affect the ML models while others (e.g., anomaly detection) act within the broader system. In this paper we consider a different task for defending the adversary, focusing on the attacker, rather than the attack. We present and demonstrate a framework for identifying characteristics about the attacker from an observed attack. We prove that, without additional knowledge, the attacker is non-identifiable (multiple potential attackers would perform the same observed attack). To address this challenge, we propose a domain-agnostic framework to identify the most probable attacker. This framework aids the defender in two ways. First, knowledge about the attacker can be leveraged for exogenous mitigation (i.e., addressing the vulnerability by altering the decision-making system outside the learning algorithm and/or limiting the attacker's capability). Second, when implementing defense methods that directly affect the learning process (e.g., adversarial regularization), knowledge of the specific attacker improves performance. We present the details of our framework and illustrate its applicability through specific instantiations on a variety of learners.

Key Contributions

Domain-agnostic framework that formulates adversary identification as a reverse optimization problem, outputting the most probable attacker parameters given an observed attack and prior belief distribution
Mathematical proof that attackers are non-identifiable (multiple attackers could produce the same observed attack) without additional knowledge
Demonstration that inferred attacker parameters improve adversarial regularization performance and enable exogenous mitigation strategies

🛡️ Threat Analysis

Input Manipulation Attack

The paper focuses on data-manipulation attacks (adversarial perturbations to inputs causing misclassification) and proposes a defense framework that infers attacker parameters to improve adversarial regularization — a direct defense against inference-time input manipulation attacks.

Details

Domains

tabular

Model Types

traditional_ml

Threat Tags

white_boxblack_boxgrey_boxinference_time

Applications

2025 1 cit.

Input Manipulation Attack

64%