survey 2025

SoK: The Last Line of Defense: On Backdoor Defense Evaluation

Gorka Abad ¹, Marina Krček ², Stefanos Koffas ³, Behrad Tajalli ², Marco Arazzi ⁴, Roberto Riaño ⁵, Xiaoyun Xu ², Zhuoran Liu ², Antonino Nocera ⁴, Stjepan Picek ²

¹ University of Bergen

² Radboud University

³ Delft University of Technology

⁴ University of Pavia

⁵ Ikerlan Research Center

1 citations · arXiv

Published on arXiv

2511.13143

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Defense effectiveness varies substantially across evaluation setups, and the field exhibits critical gaps including hyperparameter selection bias, insufficient reporting of computational overhead, and incomplete experimentation — undermining fair comparison across the 183 surveyed papers.

Backdoor attacks pose a significant threat to deep learning models by implanting hidden vulnerabilities that can be activated by malicious inputs. While numerous defenses have been proposed to mitigate these attacks, the heterogeneous landscape of evaluation methodologies hinders fair comparison between defenses. This work presents a systematic (meta-)analysis of backdoor defenses through a comprehensive literature review and empirical evaluation. We analyzed 183 backdoor defense papers published between 2018 and 2025 across major AI and security venues, examining the properties and evaluation methodologies of these defenses. Our analysis reveals significant inconsistencies in experimental setups, evaluation metrics, and threat model assumptions in the literature. Through extensive experiments involving three datasets (MNIST, CIFAR-100, ImageNet-1K), four model architectures (ResNet-18, VGG-19, ViT-B/16, DenseNet-121), 16 representative defenses, and five commonly used attacks, totaling over 3\,000 experiments, we demonstrate that defense effectiveness varies substantially across different evaluation setups. We identify critical gaps in current evaluation practices, including insufficient reporting of computational overhead and behavior under benign conditions, bias in hyperparameter selection, and incomplete experimentation. Based on our findings, we provide concrete challenges and well-motivated recommendations to standardize and improve future defense evaluations. Our work aims to equip researchers and industry practitioners with actionable insights for developing, assessing, and deploying defenses to different systems.

Key Contributions

Systematic literature review of 183 backdoor defense papers (2018–2025) across major AI and security venues, revealing significant inconsistencies in experimental setups, metrics, and threat model assumptions
Empirical meta-evaluation of 16 representative backdoor defenses against 5 attacks across 3 datasets (MNIST, CIFAR-100, ImageNet-1K) and 4 architectures totaling over 3,000 experiments, demonstrating that reported effectiveness is highly sensitive to evaluation setup
Concrete recommendations to standardize backdoor defense evaluation, including guidelines on computational overhead reporting, hyperparameter selection, benign-condition behavior, and experimental completeness

🛡️ Threat Analysis

Model Poisoning

The paper's entire focus is on backdoor attack defenses — it reviews 183 backdoor defense papers, empirically evaluates 16 representative defenses against 5 backdoor attacks, and proposes methodology improvements. This directly maps to ML10 (Model Poisoning / Backdoors & Trojans).

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

training_time

Datasets

MNISTCIFAR-100ImageNet-1K

Applications

image classification

Read PDF arXiv DOI

SoK: The Last Line of Defense: On Backdoor Defense Evaluation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

No Trust Issues Here: A Technical Report on the Winning Solutions for the Rayan AI Contest

ArcGen: Generalizing Neural Backdoor Detection Across Diverse Architectures

Trojans in Artificial Intelligence (TrojAI) Final Report

Prototype-Guided Robust Learning against Backdoor Attacks

Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning

DSBA: Dynamic Stealthy Backdoor Attack with Collaborative Optimization in Self-Supervised Learning

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods