defense 2025

Navigating the Designs of Privacy-Preserving Fine-tuning for Large Language Models

Haonan Shi , Tu Ouyang , An Wang

0 citations

α

Published on arXiv

2501.04323

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

GuardedTuning designs based on split learning and offsite tuning architectures protect client fine-tuning data against reconstruction attacks while maintaining competitive downstream task performance across multiple privacy-utility-cost trade-off configurations.

GuardedTuning

Novel technique introduced


Instruction tuning has proven effective in enhancing Large Language Models' (LLMs) performance on downstream tasks. However, real-world fine-tuning faces inherent conflicts between model providers' intellectual property protection, clients' data privacy requirements, and tuning costs. While recent approaches like split learning and offsite tuning demonstrate promising architectures for privacy-preserving fine-tuning, there is a gap in systematically addressing the multidimensional trade-offs required for diverse real-world deployments. We propose several indicative evaluation metrics to guide design trade-offs for privacy-preserving fine-tuning and a series of example designs, collectively named GuardedTuning; they result from novel combinations of system architectures with adapted privacy-enhancement methods and emerging computation techniques. Each design represents distinct trade-offs across model utility, privacy guarantees, and costs. Experimental results demonstrate that these designs protect against data reconstruction attacks while maintaining competitive fine-tuning performance.


Key Contributions

  • GuardedTuning: a family of privacy-preserving LLM fine-tuning designs combining split learning / offsite tuning architectures with privacy-enhancement methods (e.g., differential privacy, obfuscation) and emerging computation techniques
  • Indicative evaluation metrics that systematically characterize trade-offs across model utility, privacy guarantees, and computational cost for diverse deployment scenarios
  • Empirical validation that GuardedTuning designs resist data reconstruction attacks while preserving competitive instruction-tuning performance

🛡️ Threat Analysis

Model Inversion Attack

The paper directly defends against data reconstruction attacks — an adversary reconstructing clients' private fine-tuning data from intermediate activations or gradients exchanged in split learning / offsite tuning architectures. This is the core ML03 threat model: an adversary reverse-engineering a model pipeline to recover private training data.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
training_timewhite_box
Applications
llm fine-tuninginstruction tuningsplit learning