defense 2026

LoRA and Privacy: When Random Projections Help (and When They Don't)

Yaxi Hu 1, Johanna Düngler 2, Bernhard Schölkopf 1, Amartya Sanyal 2

0 citations · arXiv

α

Published on arXiv

2601.21719

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

LoRA-style updates are not inherently private (MIA AUC > 0.99 noise-free), but the noisy low-rank variant achieves stronger privacy guarantees than full fine-tuning at the same noise level, enabling lower noise and improved accuracy with tighter DP accounting

Wishart projection mechanism

Novel technique introduced


We introduce the (Wishart) projection mechanism, a randomized map of the form $S \mapsto M f(S)$ with $M \sim W_d(1/r I_d, r)$ and study its differential privacy properties. For vector-valued queries $f$, we prove non-asymptotic DP guarantees without any additive noise, showing that Wishart randomness alone can suffice. For matrix-valued queries, however, we establish a sharp negative result: in the noise-free setting, the mechanism is not DP, and we demonstrate its vulnerability by implementing a near perfect membership inference attack (AUC $> 0.99$). We then analyze a noisy variant and prove privacy amplification due to randomness and low rank projection, in both large- and small-rank regimes, yielding stronger privacy guarantees than additive noise alone. Finally, we show that LoRA-style updates are an instance of the matrix-valued mechanism, implying that LoRA is not inherently private despite its built-in randomness, but that low-rank fine-tuning can be more private than full fine-tuning at the same noise level. Preliminary experiments suggest that tighter accounting enables lower noise and improved accuracy in practice.


Key Contributions

  • Introduces the Wishart projection mechanism and proves non-asymptotic DP guarantees for vector-valued queries without any additive noise
  • Establishes a sharp negative result that the matrix-valued (LoRA-style) mechanism is not differentially private, demonstrated concretely via a membership inference attack with AUC > 0.99
  • Proves privacy amplification for the noisy variant showing that low-rank fine-tuning can achieve strictly stronger DP guarantees than full fine-tuning at the same noise level

🛡️ Threat Analysis

Membership Inference Attack

The core negative result is demonstrated via an explicit membership inference attack (AUC > 0.99) against the noise-free matrix projection mechanism, directly showing LoRA-style gradient updates are vulnerable to MIA; DP bounds throughout the paper characterize resistance to membership inference advantage.


Details

Domains
nlp
Model Types
transformerllm
Threat Tags
training_time
Applications
low-rank fine-tuningprivate llm fine-tuning