TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening
Nam Le 1, Leo Yu Zhang 2, Kewen Liao 1, Shirui Pan 2, Wei Luo 1
Published on arXiv
2510.14299
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
With only five clean held-out examples per class, TED++ achieves near-perfect backdoor detection, improving AUROC by up to 14% over the next-best method under adaptive attacks.
TED++
Novel technique introduced
As deep neural networks power increasingly critical applications, stealthy backdoor attacks, where poisoned training inputs trigger malicious model behaviour while appearing benign, pose a severe security risk. Many existing defences are vulnerable when attackers exploit subtle distance-based anomalies or when clean examples are scarce. To meet this challenge, we introduce TED++, a submanifold-aware framework that effectively detects subtle backdoors that evade existing defences. TED++ begins by constructing a tubular neighbourhood around each class's hidden-feature manifold, estimating its local ``thickness'' from a handful of clean activations. It then applies Locally Adaptive Ranking (LAR) to detect any activation that drifts outside the admissible tube. By aggregating these LAR-adjusted ranks across all layers, TED++ captures how faithfully an input remains on the evolving class submanifolds. Based on such characteristic ``tube-constrained'' behaviour, TED++ flags inputs whose LAR-based ranking sequences deviate significantly. Extensive experiments are conducted on benchmark datasets and tasks, demonstrating that TED++ achieves state-of-the-art detection performance under both adaptive-attack and limited-data scenarios. Remarkably, even with only five held-out examples per class, TED++ still delivers near-perfect detection, achieving gains of up to 14\% in AUROC over the next-best method. The code is publicly available at https://github.com/namle-w/TEDpp.
Key Contributions
- Tubular-neighbourhood modelling that estimates the 'thickness' of each class's hidden-feature manifold from as few as five clean samples per class
- Locally Adaptive Ranking (LAR) that flags activations drifting outside the admissible manifold tube, aggregated across all layers into a trajectory-based detection score
- State-of-the-art backdoor detection under both adaptive-attack and limited-data scenarios, achieving up to 14% AUROC gain over the next-best baseline
🛡️ Threat Analysis
TED++ is directly a defense against backdoor/trojan attacks — it detects poisoned inputs with hidden trigger patterns by checking whether their layerwise activations remain faithful to the class submanifold, catching trigger-induced activation drift that standard defenses miss.