attack 2026

Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Bethan Evans , Jared Tanner

0 citations · 43 references · arXiv

α

Published on arXiv

2601.16880

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Low-rank compression can reliably activate latent backdoors in DNNs while preserving full-precision accuracy, with provable compression thresholds governing attack feasibility derived from backpropagated margin sensitivity.

Low-Rank Activated Backdoor Attack

Novel technique introduced


The minimal norm weight perturbations of DNNs required to achieve a specified change in output are derived and the factors determining its size are discussed. These single-layer exact formulae are contrasted with more generic multi-layer Lipschitz constant based robustness guarantees; both are observed to be of the same order which indicates similar efficacy in their guarantees. These results are applied to precision-modification-activated backdoor attacks, establishing provable compression thresholds below which such attacks cannot succeed, and show empirically that low-rank compression can reliably activate latent backdoors while preserving full-precision accuracy. These expressions reveal how back-propagated margins govern layer-wise sensitivity and provide certifiable guarantees on the smallest parameter updates consistent with a desired output shift.


Key Contributions

  • Closed-form derivation of minimal norm weight perturbations required to achieve a specified DNN output change, providing certifiable guarantees on parameter update sensitivity
  • Theoretical characterization of provable compression thresholds below which precision-modification-activated backdoor attacks cannot succeed
  • Empirical demonstration that low-rank compression reliably activates latent backdoors in full-precision models while maintaining clean accuracy

🛡️ Threat Analysis

Model Poisoning

The paper introduces and analyzes precision-modification-activated backdoor attacks — latent trojans embedded in full-precision model weights that remain dormant until low-rank compression or quantization activates them. The theoretical framework establishes certifiable thresholds for when such backdoors can and cannot succeed, and experiments confirm that low-rank compression reliably triggers hidden malicious behavior while preserving normal accuracy.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timeinference_timetargeteddigitalwhite_box
Applications
image classificationmodel compressionneural network deployment