Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI

Collaborative machine learning across healthcare institutions promises improved diagnostic accuracy by leveraging diverse datasets, yet privacy regulations such as HIPAA prohibit direct patient data sharing. While federated learning (FL) enables decentralized training without raw data exchange, recent studies show that model gradients in conventional FL remain vulnerable to reconstruction attacks, potentially exposing sensitive medical information. This paper presents a privacy-preserving federated learning framework combining Vision Transformers (ViT) with homomorphic encryption (HE) for secure multi-institutional histopathology classification. The approach leverages the ViT CLS token as a compact 768-dimensional feature representation for secure aggregation, encrypting these tokens using CKKS homomorphic encryption before transmission to the server. We demonstrate that encrypting CLS tokens achieves a 30-fold communication reduction compared to gradient encryption while maintaining strong privacy guarantees. Through evaluation on a three-client federated setup for lung cancer histopathology classification, we show that gradients are highly susceptible to model inversion attacks (PSNR: 52.26 dB, SSIM: 0.999, NMI: 0.741), enabling near-perfect image reconstruction. In contrast, the proposed CLS-protected HE approach prevents such attacks while enabling encrypted inference directly on ciphertexts, requiring only 326 KB of encrypted data transmission per aggregation round. The framework achieves 96.12 percent global classification accuracy in the unencrypted domain and 90.02 percent in the encrypted domain.

Key Contributions

Proposes encrypting only the 768-D ViT CLS token with CKKS homomorphic encryption as the federated aggregation unit, achieving 30x communication reduction (326 KB vs. 9,794 KB) over gradient encryption while preventing reconstruction attacks
Demonstrates severity of gradient inversion attacks on conventional FL in medical imaging (PSNR 52.26 dB, SSIM 0.999, NMI 0.741 — near-perfect reconstruction)
Enables server-side encrypted inference directly on aggregated ciphertexts, achieving 90.02% accuracy under full encryption on lung cancer histopathology classification

🛡️ Threat Analysis

Model Inversion Attack

The central adversarial threat model is gradient leakage in federated learning: the paper demonstrates that shared gradients enable near-perfect training image reconstruction (PSNR 52.26 dB, SSIM 0.999), then defends against this by encrypting compact CLS tokens with CKKS homomorphic encryption so plaintext data is never exposed to the aggregation server.

Details

Domains

visionfederated-learning

Model Types

transformerfederated

Threat Tags

white_boxtraining_timedigital

Datasets

lung cancer histopathology dataset (3-client FL setup)

Applications

2026 0 cit.

Model Inversion Attack

73%