defense arXiv Jan 29, 2026 · 9w ago
Yaxi Hu, Johanna Düngler, Bernhard Schölkopf et al. · Max Planck Institute for Intelligent Systems · University of Copenhagen
Proves LoRA lacks inherent privacy via near-perfect MIA, then derives tighter DP bounds for noisy low-rank fine-tuning
Membership Inference Attack nlp
We introduce the (Wishart) projection mechanism, a randomized map of the form $S \mapsto M f(S)$ with $M \sim W_d(1/r I_d, r)$ and study its differential privacy properties. For vector-valued queries $f$, we prove non-asymptotic DP guarantees without any additive noise, showing that Wishart randomness alone can suffice. For matrix-valued queries, however, we establish a sharp negative result: in the noise-free setting, the mechanism is not DP, and we demonstrate its vulnerability by implementing a near perfect membership inference attack (AUC $> 0.99$). We then analyze a noisy variant and prove privacy amplification due to randomness and low rank projection, in both large- and small-rank regimes, yielding stronger privacy guarantees than additive noise alone. Finally, we show that LoRA-style updates are an instance of the matrix-valued mechanism, implying that LoRA is not inherently private despite its built-in randomness, but that low-rank fine-tuning can be more private than full fine-tuning at the same noise level. Preliminary experiments suggest that tighter accounting enables lower noise and improved accuracy in practice.
transformer llm Max Planck Institute for Intelligent Systems · University of Copenhagen