defense 2026

Towards Privacy-Preserving Mental Health Support with Large Language Models

Dong Xue , Jicheng Tu , Ming Wang , Xin Yan , Fangzhou Liu , Jie Hu

1 citations · 54 references · arXiv

α

Published on arXiv

2601.01993

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

MindChat with FL+DP fine-tuning achieves competitive counseling performance under LLM-judge and human evaluation while exhibiting measurably reduced membership inference attack success compared to standard fine-tuning baselines

MindChat

Novel technique introduced


Large language models (LLMs) have shown promise for mental health support, yet training such models is constrained by the scarcity and sensitivity of real counseling dialogues. In this article, we present MindChat, a privacy-preserving LLM for mental health support, together with MindCorpus, a synthetic multi-turn counseling dataset constructed via a multi-agent role-playing framework. To synthesize high-quality counseling data, the developed dialogue-construction framework employs a dual closed-loop feedback design to integrate psychological expertise and counseling techniques through role-playing: (i) turn-level critique-and-revision to improve coherence and counseling appropriateness within a session, and (ii) session-level strategy refinement to progressively enrich counselor behaviors across sessions. To mitigate privacy risks under decentralized data ownership, we fine-tune the base model using federated learning with parameter-efficient LoRA adapters and incorporate differentially private optimization to reduce membership and memorization risks. Experiments on synthetic-data quality assessment and counseling capability evaluation show that MindCorpus improves training effectiveness and that MindChat is competitive with existing general and counseling-oriented LLM baselines under both automatic LLM-judge and human evaluation protocols, while exhibiting reduced privacy leakage under membership inference attacks.


Key Contributions

  • MindCorpus: a synthetic multi-turn counseling dataset constructed via a multi-agent role-playing framework with dual closed-loop feedback (turn-level critique-and-revision and session-level strategy refinement)
  • MindChat: a mental health LLM fine-tuned via federated LoRA adapters with differentially private optimization to defend against membership inference attacks on sensitive counseling data
  • Demonstrated reduced MIA success on the privacy-preserved model while maintaining competitive counseling capability against general and domain-specific LLM baselines

🛡️ Threat Analysis

Membership Inference Attack

The paper explicitly uses differentially private optimization to 'reduce membership and memorization risks' and evaluates the model against membership inference attacks, demonstrating reduced privacy leakage compared to baselines — a direct, evaluated defense against MIA in LLM training.


Details

Domains
nlpfederated-learning
Model Types
llmfederated
Threat Tags
training_time
Datasets
MindCorpus
Applications
mental health counselingprivacy-preserving llm fine-tuning