Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks
Published on arXiv
2601.16880
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Low-rank compression can reliably activate latent backdoors in DNNs while preserving full-precision accuracy, with provable compression thresholds governing attack feasibility derived from backpropagated margin sensitivity.
Low-Rank Activated Backdoor Attack
Novel technique introduced
The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy. These expressions reveal how back-propagated margins govern layer-wise sensitivity and provide certifiable guarantees on the smallest parameter updates consistent with a desired output shift.
Key Contributions
- Closed-form derivation of minimal norm weight perturbations required to achieve a specified DNN output change, providing certifiable guarantees on parameter update sensitivity
- Theoretical characterization of provable compression thresholds below which precision-modification-activated backdoor attacks cannot succeed
- Empirical demonstration that low-rank compression reliably activates latent backdoors in full-precision models while maintaining clean accuracy
🛡️ Threat Analysis
The paper introduces and analyzes precision-modification-activated backdoor attacks — latent trojans embedded in full-precision model weights that remain dormant until low-rank compression or quantization activates them. The theoretical framework establishes certifiable thresholds for when such backdoors can and cannot succeed, and experiments confirm that low-rank compression reliably triggers hidden malicious behavior while preserving normal accuracy.