Identifying Adversary Characteristics from an Observed Attack
Soyon Choi 1, Scott Alfeld 2, Meiyi Ma 1
Published on arXiv
2603.05625
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
The proposed reverse-optimization framework can identify the most probable attacker characteristics from observed attacks, improving adversarial regularization when attacker parameters are unknown.
When used in automated decision-making systems, machine learning (ML) models are vulnerable to data-manipulation attacks. Some defense mechanisms (e.g., adversarial regularization) directly affect the ML models while others (e.g., anomaly detection) act within the broader system. In this paper we consider a different task for defending the adversary, focusing on the attacker, rather than the attack. We present and demonstrate a framework for identifying characteristics about the attacker from an observed attack. We prove that, without additional knowledge, the attacker is non-identifiable (multiple potential attackers would perform the same observed attack). To address this challenge, we propose a domain-agnostic framework to identify the most probable attacker. This framework aids the defender in two ways. First, knowledge about the attacker can be leveraged for exogenous mitigation (i.e., addressing the vulnerability by altering the decision-making system outside the learning algorithm and/or limiting the attacker's capability). Second, when implementing defense methods that directly affect the learning process (e.g., adversarial regularization), knowledge of the specific attacker improves performance. We present the details of our framework and illustrate its applicability through specific instantiations on a variety of learners.
Key Contributions
- Domain-agnostic framework that formulates adversary identification as a reverse optimization problem, outputting the most probable attacker parameters given an observed attack and prior belief distribution
- Mathematical proof that attackers are non-identifiable (multiple attackers could produce the same observed attack) without additional knowledge
- Demonstration that inferred attacker parameters improve adversarial regularization performance and enable exogenous mitigation strategies
🛡️ Threat Analysis
The paper focuses on data-manipulation attacks (adversarial perturbations to inputs causing misclassification) and proposes a defense framework that infers attacker parameters to improve adversarial regularization — a direct defense against inference-time input manipulation attacks.