attack 2025

Towards Backdoor Stealthiness in Model Parameter Space

Xiaoyun Xu ¹, Zhuoran Liu ¹, Stefanos Koffas ², Stjepan Picek ^1,3

¹ Radboud University Nijmegen

² Delft University of Technology

³ University of Zagreb

0 citations

Published on arXiv

2501.05928

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Grond outperforms all 12 compared backdoor attacks against 17 diverse state-of-the-art defenses including adaptive ones, while ABI consistently improves the stealthiness of existing common backdoor attacks

Grond (Adversarial Backdoor Injection / ABI)

Novel technique introduced

Recent research on backdoor stealthiness focuses mainly on indistinguishable triggers in input space and inseparable backdoor representations in feature space, aiming to circumvent backdoor defenses that examine these respective spaces. However, existing backdoor attacks are typically designed to resist a specific type of backdoor defense without considering the diverse range of defense mechanisms. Based on this observation, we pose a natural question: Are current backdoor attacks truly a real-world threat when facing diverse practical defenses? To answer this question, we examine 12 common backdoor attacks that focus on input-space or feature-space stealthiness and 17 diverse representative defenses. Surprisingly, we reveal a critical blind spot: Backdoor attacks designed to be stealthy in input and feature spaces can be mitigated by examining backdoored models in parameter space. To investigate the underlying causes behind this common vulnerability, we study the characteristics of backdoor attacks in the parameter space. Notably, we find that input- and feature-space attacks introduce prominent backdoor-related neurons in parameter space, which are not thoroughly considered by current backdoor attacks. Taking comprehensive stealthiness into account, we propose a novel supply-chain attack called Grond. Grond limits the parameter changes by a simple yet effective module, Adversarial Backdoor Injection (ABI), which adaptively increases the parameter-space stealthiness during the backdoor injection. Extensive experiments demonstrate that Grond outperforms all 12 backdoor attacks against state-of-the-art (including adaptive) defenses on CIFAR-10, GTSRB, and a subset of ImageNet. In addition, we show that ABI consistently improves the effectiveness of common backdoor attacks.

Key Contributions

Reveals that input- and feature-space backdoor attacks introduce prominent parameter-space anomalies (backdoor-related neurons) exploitable by diverse defenses — a critical blind spot in 12 existing attacks
Proposes Adversarial Backdoor Injection (ABI), a module that adaptively constrains parameter changes during backdoor injection to minimize parameter-space detectability
Introduces Grond, a supply-chain-motivated backdoor attack combining ABI with comprehensive stealthiness, outperforming 12 attacks against 17 defenses on CIFAR-10, GTSRB, and ImageNet

🛡️ Threat Analysis

Model Poisoning

Grond injects hidden, trigger-activated backdoor behavior into model weights using Adversarial Backdoor Injection (ABI) to minimize parameter-space footprint — a direct backdoor/trojan attack. Despite mentioning 'supply-chain attack' as motivation, the primary contribution is the backdoor injection technique itself, not a supply chain compromise method, so ML06 is not warranted per the explicit guideline for parameter-space stealthiness papers.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxtraining_timetargeteddigital

Datasets

CIFAR-10GTSRBImageNet

Applications

image classification

Read PDF arXiv Code

Towards Backdoor Stealthiness in Model Parameter Space

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

The Double-Edged Sword of Data-Driven Super-Resolution: Adversarial Super-Resolution Models

Hardware-Triggered Backdoors

Enhancing All-to-X Backdoor Attacks with Optimized Target Class Mapping

Backdoor Attacks on Deep Learning Face Detection

InkDrop: Invisible Backdoor Attacks Against Dataset Condensation

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors