Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

Predictive confidence serves as a foundational control signal in mission-critical systems, directly governing risk-aware logic such as escalation, abstention, and conservative fallback. While prior federated learning attacks predominantly target accuracy or implant backdoors, we identify confidence calibration as a distinct attack objective. We present the Temperature Scaling Attack (TSA), a training-time attack that degrades calibration while preserving accuracy. By injecting temperature scaling with learning rate-temperature coupling during local training, malicious updates maintain benign-like optimization behavior, evading accuracy-based monitoring and similarity-based detection. We provide a convergence analysis under non-IID settings, showing that this coupling preserves standard convergence bounds while systematically distorting confidence. Across three benchmarks, TSA substantially shifts calibration (e.g., 145% error increase on CIFAR-100) with <2 accuracy change, and remains effective under robust aggregation and post-hoc calibration defenses. Case studies further show that confidence manipulation can cause up to 7.2x increases in missed critical cases (healthcare) or false alarms (autonomous driving), even when accuracy is unchanged. Overall, our results establish calibration integrity as a critical attack surface in federated learning.

Key Contributions

Identifies confidence calibration as a distinct and under-studied attack objective in federated learning, separate from accuracy degradation or backdoor insertion
Proposes Temperature Scaling Attack (TSA) using learning-rate–temperature coupling (β parameter) that preserves convergence bounds while systematically distorting calibration, evading accuracy-based and similarity-based detection
Demonstrates real-world impact via case studies showing up to 7.2x increase in missed critical cases (healthcare) or false alarms (autonomous driving) even when accuracy is unchanged

🛡️ Threat Analysis

Data Poisoning Attack

TSA is a Byzantine attack in federated learning: malicious FL participants manipulate their local training to inject poisoned model updates that degrade calibration globally. This directly matches the ML02 definition of Byzantine attacks — malicious clients sending manipulated updates to degrade model behavior — evaluated against robust aggregation defenses (Krum, FedAvg variants).

Details

Domains

federated-learningvisionnlptimeseries

Model Types

cnntransformer

Threat Tags

training_timegrey_box

Datasets

MNISTCIFAR-10CIFAR-100MITBIH ArrhythmiaKITTI-2DWikiText2

Applications

2025 0 cit.

Data Poisoning Attack

56%

Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

Power to the Clients: Federated Learning in a Dictatorship Setting

ZK-HybridFL: Zero-Knowledge Proof-Enhanced Hybrid Ledger for Federated Learning

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

FedAgain: A Trust-Based and Robust Federated Learning Strategy for an Automated Kidney Stone Identification in Ureteroscopy

Lightweight and Robust Federated Data Valuation

Cost-TrustFL: Cost-Aware Hierarchical Federated Learning with Lightweight Reputation Evaluation across Multi-Cloud

Accuracy is Not Enough: Poisoning Interpretability in Federated Learning via Color Skew