benchmark 2025

Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck

Xinyu Liu ¹, Xu Zhang ², Can Chen ³, Ren Wang ²

¹ Michigan State University

² Illinois Institute of Technology

³ The University of North Carolina at Chapel Hill

0 citations · arXiv

Published on arXiv

2511.21923

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Visually conspicuous backdoor attacks like BadNets integrate more seamlessly into model representations than many visually imperceptible attacks, as measured by mutual information dynamics under the Information Bottleneck framework.

Dynamics-based Stealthiness Metric (IB-Backdoor)

Novel technique introduced

Understanding how backdoor data influences neural network training dynamics remains a complex and underexplored challenge. In this paper, we present a rigorous analysis of the impact of backdoor data on the learning process, with a particular focus on the distinct behaviors between the target class and other clean classes. Leveraging the Information Bottleneck (IB) principle connected with clustering of internal representation, We find that backdoor attacks create unique mutual information (MI) signatures, which evolve across training phases and differ based on the attack mechanism. Our analysis uncovers a surprising trade-off: visually conspicuous attacks like BadNets can achieve high stealthiness from an information-theoretic perspective, integrating more seamlessly into the model than many visually imperceptible attacks. Building on these insights, we propose a novel, dynamics-based stealthiness metric that quantifies an attack's integration at the model level. We validate our findings and the proposed metric across multiple datasets and diverse attack types, offering a new dimension for understanding and evaluating backdoor threats. Our code is available in: https://github.com/XinyuLiu71/Information_Bottleneck_Backdoor.git.

Key Contributions

Information Bottleneck-based analysis revealing distinct mutual information signatures for backdoor vs. clean class data across training phases
Counter-intuitive finding that visually conspicuous attacks (e.g., BadNets) can be more stealthy than imperceptible attacks from an information-theoretic perspective
Novel dynamics-based stealthiness metric quantifying a backdoor attack's model-level integration, validated across multiple datasets and attack types

🛡️ Threat Analysis

Model Poisoning

Paper rigorously analyzes how backdoor/trojan attacks embed hidden targeted behavior into neural networks during training, studying mutual information signatures of backdoor data across training phases and proposing a metric to quantify backdoor integration stealthiness.

Details

Domains

vision

Model Types

cnn

Threat Tags

training_timetargeted

Datasets

CIFAR-10GTSRB

Applications

image classification

Read PDF arXiv DOI Code

Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Unsupervised Backdoor Detection and Mitigation for Spiking Neural Networks

Quantization Blindspots: How Model Compression Breaks Backdoor Defenses

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

Enhancing All-to-X Backdoor Attacks with Optimized Target Class Mapping

Backdoor Attacks on Deep Learning Face Detection

NT-ML: Backdoor Defense via Non-target Label Training and Mutual Learning

InkDrop: Invisible Backdoor Attacks Against Dataset Condensation

Isolate Trigger: Detecting and Eliminating Adaptive Backdoor Attacks