benchmark 2025

Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck

Xinyu Liu 1, Xu Zhang 2, Can Chen 3, Ren Wang 2

0 citations · arXiv

α

Published on arXiv

2511.21923

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Visually conspicuous backdoor attacks like BadNets integrate more seamlessly into model representations than many visually imperceptible attacks, as measured by mutual information dynamics under the Information Bottleneck framework.

Dynamics-based Stealthiness Metric (IB-Backdoor)

Novel technique introduced


Understanding how backdoor data influences neural network training dynamics remains a complex and underexplored challenge. In this paper, we present a rigorous analysis of the impact of backdoor data on the learning process, with a particular focus on the distinct behaviors between the target class and other clean classes. Leveraging the Information Bottleneck (IB) principle connected with clustering of internal representation, We find that backdoor attacks create unique mutual information (MI) signatures, which evolve across training phases and differ based on the attack mechanism. Our analysis uncovers a surprising trade-off: visually conspicuous attacks like BadNets can achieve high stealthiness from an information-theoretic perspective, integrating more seamlessly into the model than many visually imperceptible attacks. Building on these insights, we propose a novel, dynamics-based stealthiness metric that quantifies an attack's integration at the model level. We validate our findings and the proposed metric across multiple datasets and diverse attack types, offering a new dimension for understanding and evaluating backdoor threats. Our code is available in: https://github.com/XinyuLiu71/Information_Bottleneck_Backdoor.git.


Key Contributions

  • Information Bottleneck-based analysis revealing distinct mutual information signatures for backdoor vs. clean class data across training phases
  • Counter-intuitive finding that visually conspicuous attacks (e.g., BadNets) can be more stealthy than imperceptible attacks from an information-theoretic perspective
  • Novel dynamics-based stealthiness metric quantifying a backdoor attack's model-level integration, validated across multiple datasets and attack types

🛡️ Threat Analysis

Model Poisoning

Paper rigorously analyzes how backdoor/trojan attacks embed hidden targeted behavior into neural networks during training, studying mutual information signatures of backdoor data across training phases and proposing a metric to quantify backdoor integration stealthiness.


Details

Domains
vision
Model Types
cnn
Threat Tags
training_timetargeted
Datasets
CIFAR-10GTSRB
Applications
image classification