HPE: Hallucinated Positive Entanglement for Backdoor Attacks in Federated Self-Supervised Learning
Jiayao Wang 1, Yang Song 1, Zhendong Zhao 2, Jiale Zhang 1, Qilin Wu 3, Wenliang Yuan 4, Junwu Zhu 1, Dongfang Zhao 5
Published on arXiv
2602.02147
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
HPE significantly outperforms existing FSSL backdoor attacks in attack success rate and demonstrates robustness against multiple defense mechanisms across several FSSL scenarios.
HPE (Hallucinated Positive Entanglement)
Novel technique introduced
Federated self-supervised learning (FSSL) enables collaborative training of self-supervised representation models without sharing raw unlabeled data. While it serves as a crucial paradigm for privacy-preserving learning, its security remains vulnerable to backdoor attacks, where malicious clients manipulate local training to inject targeted backdoors. Existing FSSL attack methods, however, often suffer from low utilization of poisoned samples, limited transferability, and weak persistence. To address these limitations, we propose a new backdoor attack method for FSSL, namely Hallucinated Positive Entanglement (HPE). HPE first employs hallucination-based augmentation using synthetic positive samples to enhance the encoder's embedding of backdoor features. It then introduces feature entanglement to enforce tight binding between triggers and backdoor samples in the representation space. Finally, selective parameter poisoning and proximity-aware updates constrain the poisoned model within the vicinity of the global model, enhancing its stability and persistence. Experimental results on several FSSL scenarios and datasets show that HPE significantly outperforms existing backdoor attack methods in performance and exhibits strong robustness under various defense mechanisms.
Key Contributions
- Hallucination-based augmentation using synthetic positive samples to enhance encoder embedding of backdoor trigger features in FSSL
- Feature entanglement technique that enforces tight binding between triggers and backdoor samples in the representation space
- Selective parameter poisoning with proximity-aware updates to keep poisoned models near the global model, improving backdoor persistence across aggregation rounds
🛡️ Threat Analysis
HPE is explicitly a backdoor/trojan attack: it embeds trigger-activated hidden behavior into federated SSL encoders that activates only on backdoored inputs, while the model behaves normally otherwise. The paper proposes novel techniques (hallucination augmentation, feature entanglement, proximity-aware updates) to improve backdoor persistence and stealthiness through FL aggregation rounds.