Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI
Al Amin 1, Kamrul Hasan 1, Liang Hong 1, Sharif Ullah 2
Published on arXiv
2511.20983
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
CKKS encryption of 768-D CLS tokens prevents gradient inversion attacks (which otherwise achieve PSNR 52.26 dB) while maintaining 90.02% global accuracy and reducing per-sample encrypted communication to 326 KB.
CLS-HE FL (CKKS-encrypted CLS token federated aggregation)
Novel technique introduced
Collaborative machine learning across healthcare institutions promises improved diagnostic accuracy by leveraging diverse datasets, yet privacy regulations such as HIPAA prohibit direct patient data sharing. While federated learning (FL) enables decentralized training without raw data exchange, recent studies show that model gradients in conventional FL remain vulnerable to reconstruction attacks, potentially exposing sensitive medical information. This paper presents a privacy-preserving federated learning framework combining Vision Transformers (ViT) with homomorphic encryption (HE) for secure multi-institutional histopathology classification. The approach leverages the ViT CLS token as a compact 768-dimensional feature representation for secure aggregation, encrypting these tokens using CKKS homomorphic encryption before transmission to the server. We demonstrate that encrypting CLS tokens achieves a 30-fold communication reduction compared to gradient encryption while maintaining strong privacy guarantees. Through evaluation on a three-client federated setup for lung cancer histopathology classification, we show that gradients are highly susceptible to model inversion attacks (PSNR: 52.26 dB, SSIM: 0.999, NMI: 0.741), enabling near-perfect image reconstruction. In contrast, the proposed CLS-protected HE approach prevents such attacks while enabling encrypted inference directly on ciphertexts, requiring only 326 KB of encrypted data transmission per aggregation round. The framework achieves 96.12 percent global classification accuracy in the unencrypted domain and 90.02 percent in the encrypted domain.
Key Contributions
- Proposes encrypting only the 768-D ViT CLS token with CKKS homomorphic encryption as the federated aggregation unit, achieving 30x communication reduction (326 KB vs. 9,794 KB) over gradient encryption while preventing reconstruction attacks
- Demonstrates severity of gradient inversion attacks on conventional FL in medical imaging (PSNR 52.26 dB, SSIM 0.999, NMI 0.741 — near-perfect reconstruction)
- Enables server-side encrypted inference directly on aggregated ciphertexts, achieving 90.02% accuracy under full encryption on lung cancer histopathology classification
🛡️ Threat Analysis
The central adversarial threat model is gradient leakage in federated learning: the paper demonstrates that shared gradients enable near-perfect training image reconstruction (PSNR 52.26 dB, SSIM 0.999), then defends against this by encrypting compact CLS tokens with CKKS homomorphic encryption so plaintext data is never exposed to the aggregation server.