Towards Privacy-Preserving Mental Health Support with Large Language Models
Dong Xue , Jicheng Tu , Ming Wang , Xin Yan , Fangzhou Liu , Jie Hu
Published on arXiv
2601.01993
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
MindChat with FL+DP fine-tuning achieves competitive counseling performance under LLM-judge and human evaluation while exhibiting measurably reduced membership inference attack success compared to standard fine-tuning baselines
MindChat
Novel technique introduced
Large language models (LLMs) have shown promise for mental health support, yet training such models is constrained by the scarcity and sensitivity of real counseling dialogues. In this article, we present MindChat, a privacy-preserving LLM for mental health support, together with MindCorpus, a synthetic multi-turn counseling dataset constructed via a multi-agent role-playing framework. To synthesize high-quality counseling data, the developed dialogue-construction framework employs a dual closed-loop feedback design to integrate psychological expertise and counseling techniques through role-playing: (i) turn-level critique-and-revision to improve coherence and counseling appropriateness within a session, and (ii) session-level strategy refinement to progressively enrich counselor behaviors across sessions. To mitigate privacy risks under decentralized data ownership, we fine-tune the base model using federated learning with parameter-efficient LoRA adapters and incorporate differentially private optimization to reduce membership and memorization risks. Experiments on synthetic-data quality assessment and counseling capability evaluation show that MindCorpus improves training effectiveness and that MindChat is competitive with existing general and counseling-oriented LLM baselines under both automatic LLM-judge and human evaluation protocols, while exhibiting reduced privacy leakage under membership inference attacks.
Key Contributions
- MindCorpus: a synthetic multi-turn counseling dataset constructed via a multi-agent role-playing framework with dual closed-loop feedback (turn-level critique-and-revision and session-level strategy refinement)
- MindChat: a mental health LLM fine-tuned via federated LoRA adapters with differentially private optimization to defend against membership inference attacks on sensitive counseling data
- Demonstrated reduced MIA success on the privacy-preserved model while maintaining competitive counseling capability against general and domain-specific LLM baselines
🛡️ Threat Analysis
The paper explicitly uses differentially private optimization to 'reduce membership and memorization risks' and evaluates the model against membership inference attacks, demonstrating reduced privacy leakage compared to baselines — a direct, evaluated defense against MIA in LLM training.