defense 2026

LoRA as Oracle

Marco Arazzi , Antonino Nocera

0 citations · 28 references · arXiv

α

Published on arXiv

2601.11207

Model Poisoning

OWASP ML Top 10 — ML10

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Poisoned and member samples induce distinctive low-rank LoRA updates that reliably distinguish them from clean or non-member data without access to training data or model modification.

LoRA as Oracle

Novel technique introduced


Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detection and membership inference typically require access to clean reference models, extensive retraining, or strong assumptions about the attack mechanism. In this work, we introduce a novel LoRA-based oracle framework that leverages low-rank adaptation modules as a lightweight, model-agnostic probe for both backdoor detection and membership inference. Our approach attaches task-specific LoRA adapters to a frozen backbone and analyzes their optimization dynamics and representation shifts when exposed to suspicious samples. We show that poisoned and member samples induce distinctive low-rank updates that differ significantly from those generated by clean or non-member data. These signals can be measured using simple ranking and energy-based statistics, enabling reliable inference without access to the original training data or modification of the deployed model.


Key Contributions

  • LoRA-based oracle framework that attaches lightweight adapters to frozen backbones to probe both backdoor presence and training membership without retraining or access to original data.
  • Demonstrates that poisoned and member samples induce statistically distinctive low-rank update patterns measurable via ranking and energy-based statistics.
  • Model-agnostic auditing approach applicable to large pre-trained models including LLMs, requiring no clean reference model or original training set.

🛡️ Threat Analysis

Membership Inference Attack

The framework explicitly targets membership inference as an auditing capability, using LoRA optimization dynamics to determine whether specific data points were included in model pretraining — directly within the ML04 threat scope.

Model Poisoning

Core contribution includes a backdoor detection method that analyzes LoRA adapter updates to identify poisoned samples inducing distinctive low-rank parameter shifts — a direct defense against model backdoors without requiring access to original training data.


Details

Domains
visionnlp
Model Types
transformerllm
Threat Tags
training_timegrey_box
Applications
pre-trained model auditingbackdoor detectionmembership inference