Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy. These expressions reveal how back-propagated margins govern layer-wise sensitivity and provide certifiable guarantees on the smallest parameter updates consistent with a desired output shift.

Key Contributions

Closed-form derivation of minimal norm weight perturbations required to achieve a specified DNN output change, providing certifiable guarantees on parameter update sensitivity
Theoretical characterization of provable compression thresholds below which precision-modification-activated backdoor attacks cannot succeed
Empirical demonstration that low-rank compression reliably activates latent backdoors in full-precision models while maintaining clean accuracy

🛡️ Threat Analysis

Model Poisoning

The paper introduces and analyzes precision-modification-activated backdoor attacks — latent trojans embedded in full-precision model weights that remain dormant until low-rank compression or quantization activates them. The theoretical framework establishes certifiable thresholds for when such backdoors can and cannot succeed, and experiments confirm that low-rank compression reliably triggers hidden malicious behavior while preserving normal accuracy.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

training_timeinference_timetargeteddigitalwhite_box

Applications

2025 1 cit.

Model Poisoning

85%