Sy-FAR: Symmetry-based Fair Adversarial Robustness
Haneen Najjar , Eyal Ronen , Mahmood Sharif
Published on arXiv
2509.12939
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Sy-FAR significantly outperforms state-of-the-art fair adversarial robustness methods across five datasets and three architectures, while also being faster and more consistent across training runs
Sy-FAR
Novel technique introduced
Security-critical machine-learning (ML) systems, such as face-recognition systems, are susceptible to adversarial examples, including real-world physically realizable attacks. Various means to boost ML's adversarial robustness have been proposed; however, they typically induce unfair robustness: It is often easier to attack from certain classes or groups than from others. Several techniques have been developed to improve adversarial robustness while seeking perfect fairness between classes. Yet, prior work has focused on settings where security and fairness are less critical. Our insight is that achieving perfect parity in realistic fairness-critical tasks, such as face recognition, is often infeasible -- some classes may be highly similar, leading to more misclassifications between them. Instead, we suggest that seeking symmetry -- i.e., attacks from class $i$ to $j$ would be as successful as from $j$ to $i$ -- is more tractable. Intuitively, symmetry is a desirable because class resemblance is a symmetric relation in most domains. Additionally, as we prove theoretically, symmetry between individuals induces symmetry between any set of sub-groups, in contrast to other fairness notions where group-fairness is often elusive. We develop Sy-FAR, a technique to encourage symmetry while also optimizing adversarial robustness and extensively evaluate it using five datasets, with three model architectures, including against targeted and untargeted realistic attacks. The results show Sy-FAR significantly improves fair adversarial robustness compared to state-of-the-art methods. Moreover, we find that Sy-FAR is faster and more consistent across runs. Notably, Sy-FAR also ameliorates another type of unfairness we discover in this work -- target classes that adversarial examples are likely to be classified into become significantly less vulnerable after inducing symmetry.
Key Contributions
- Introduces symmetry as a more tractable fairness notion for adversarial robustness: attacks from class i to j should be as likely to succeed as attacks from j to i
- Proposes Sy-FAR, a training technique that jointly optimizes adversarial robustness and inter-class attack symmetry, with a theoretical proof that individual-level symmetry induces group-level symmetry
- Discovers and ameliorates a novel unfairness: target classes that adversarial examples cluster toward become less vulnerable after symmetry induction
🛡️ Threat Analysis
Primary contribution is a defense (adversarial training with symmetry constraint) against adversarial examples and physically realizable attacks targeting image classification and face recognition models at inference time.