Haonan Shi

Papers in Database (2)

defense arXiv Jan 8, 2025 · Jan 2025

Navigating the Designs of Privacy-Preserving Fine-tuning for Large Language Models

Haonan Shi, Tu Ouyang, An Wang · Case Western Reserve University

Proposes GuardedTuning framework defending against data reconstruction attacks during privacy-preserving LLM fine-tuning via split learning

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Mar 8, 2026 · 29d ago

Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning

Guoli Wang, Haonan Shi, Tu Ouyang et al. · Case Western Reserve University

Preserves LLM safety alignment during fine-tuning by regularizing confidence on a small subset of safety-critical tokens only

Transfer Learning Attack Prompt Injection nlp
PDF