attack arXiv Jan 23, 2026 · 10w ago
Bethan Evans, Jared Tanner · University of Oxford
Derives minimal weight perturbation bounds for DNNs and shows low-rank compression reliably activates latent hidden backdoors
Model Poisoning vision
The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy. These expressions reveal how back-propagated margins govern layer-wise sensitivity and provide certifiable guarantees on the smallest parameter updates consistent with a desired output shift.
cnn transformer University of Oxford