SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training

Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes. While existing Byzantine-tolerant literature addresses data parallel (DP) training through robust aggregation methods, pipeline parallelism (PP) presents fundamentally distinct challenges. In PP, model layers are distributed across workers where the activations and their gradients flow between stages rather than being aggregated, making traditional DP approaches inapplicable. We propose SENTINEL, a verification mechanism for PP training without computation duplication. SENTINEL employs lightweight momentum-based monitoring using exponential moving averages (EMAs) to detect corrupted inter-stage communication. Unlike existing Byzantine-tolerant approaches for DP that aggregate parameter gradients across replicas, our approach verifies sequential activation/gradient transmission between layers. We provide theoretical convergence guarantees for this new setting that recovers classical convergence rates when relaxed to standard training. Experiments demonstrate successful training of up to 4B-parameter LLMs across untrusted distributed environments with up to 176 workers while maintaining model convergence and performance.

Key Contributions

First comprehensive study of vulnerabilities unique to hybrid data–pipeline parallel decentralized training, including a suite of training-interruption attack benchmarks
SENTINEL: a lightweight verifier-node mechanism using exponential moving averages and IQR-based adaptive thresholds to detect corrupted inter-stage activations and gradients without computation duplication
Theoretical convergence guarantees showing undetected malicious workers have negligible impact, validated empirically on 4B-parameter LLMs across up to 176 geographically distributed workers with >90% F1 detection scores

🛡️ Threat Analysis

Data Poisoning Attack

The core threat model is Byzantine malicious workers sending corrupted activations and activation gradients between pipeline stages to sabotage training — the pipeline-parallel analogue of Byzantine attacks in federated/distributed learning. SENTINEL is a defense via momentum-based monitoring at verifier nodes, directly analogous to Byzantine-fault-tolerant aggregation protocols for DP training.

Details

Domains

nlpfederated-learning

Model Types

llmfederated

Threat Tags

training_timegrey_box

Applications

2025 1 cit.

Data Poisoning Attack

75%