attack 2025

Towards Backdoor Stealthiness in Model Parameter Space

Xiaoyun Xu 1, Zhuoran Liu 1, Stefanos Koffas 2, Stjepan Picek 1,3

0 citations

α

Published on arXiv

2501.05928

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Grond outperforms all 12 compared backdoor attacks against 17 diverse state-of-the-art defenses including adaptive ones, while ABI consistently improves the stealthiness of existing common backdoor attacks

Grond (Adversarial Backdoor Injection / ABI)

Novel technique introduced


Recent research on backdoor stealthiness focuses mainly on indistinguishable triggers in input space and inseparable backdoor representations in feature space, aiming to circumvent backdoor defenses that examine these respective spaces. However, existing backdoor attacks are typically designed to resist a specific type of backdoor defense without considering the diverse range of defense mechanisms. Based on this observation, we pose a natural question: Are current backdoor attacks truly a real-world threat when facing diverse practical defenses? To answer this question, we examine 12 common backdoor attacks that focus on input-space or feature-space stealthiness and 17 diverse representative defenses. Surprisingly, we reveal a critical blind spot: Backdoor attacks designed to be stealthy in input and feature spaces can be mitigated by examining backdoored models in parameter space. To investigate the underlying causes behind this common vulnerability, we study the characteristics of backdoor attacks in the parameter space. Notably, we find that input- and feature-space attacks introduce prominent backdoor-related neurons in parameter space, which are not thoroughly considered by current backdoor attacks. Taking comprehensive stealthiness into account, we propose a novel supply-chain attack called Grond. Grond limits the parameter changes by a simple yet effective module, Adversarial Backdoor Injection (ABI), which adaptively increases the parameter-space stealthiness during the backdoor injection. Extensive experiments demonstrate that Grond outperforms all 12 backdoor attacks against state-of-the-art (including adaptive) defenses on CIFAR-10, GTSRB, and a subset of ImageNet. In addition, we show that ABI consistently improves the effectiveness of common backdoor attacks.


Key Contributions

  • Reveals that input- and feature-space backdoor attacks introduce prominent parameter-space anomalies (backdoor-related neurons) exploitable by diverse defenses — a critical blind spot in 12 existing attacks
  • Proposes Adversarial Backdoor Injection (ABI), a module that adaptively constrains parameter changes during backdoor injection to minimize parameter-space detectability
  • Introduces Grond, a supply-chain-motivated backdoor attack combining ABI with comprehensive stealthiness, outperforming 12 attacks against 17 defenses on CIFAR-10, GTSRB, and ImageNet

🛡️ Threat Analysis

Model Poisoning

Grond injects hidden, trigger-activated backdoor behavior into model weights using Adversarial Backdoor Injection (ABI) to minimize parameter-space footprint — a direct backdoor/trojan attack. Despite mentioning 'supply-chain attack' as motivation, the primary contribution is the backdoor injection technique itself, not a supply chain compromise method, so ML06 is not warranted per the explicit guideline for parameter-space stealthiness papers.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxtraining_timetargeteddigital
Datasets
CIFAR-10GTSRBImageNet
Applications
image classification