defense 2025

TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening

Nam Le 1, Leo Yu Zhang 2, Kewen Liao 1, Shirui Pan 2, Wei Luo 1

0 citations · 57 references · Industrial Conference on Data ...

α

Published on arXiv

2510.14299

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

With only five clean held-out examples per class, TED++ achieves near-perfect backdoor detection, improving AUROC by up to 14% over the next-best method under adaptive attacks.

TED++

Novel technique introduced


As deep neural networks power increasingly critical applications, stealthy backdoor attacks, where poisoned training inputs trigger malicious model behaviour while appearing benign, pose a severe security risk. Many existing defences are vulnerable when attackers exploit subtle distance-based anomalies or when clean examples are scarce. To meet this challenge, we introduce TED++, a submanifold-aware framework that effectively detects subtle backdoors that evade existing defences. TED++ begins by constructing a tubular neighbourhood around each class's hidden-feature manifold, estimating its local ``thickness'' from a handful of clean activations. It then applies Locally Adaptive Ranking (LAR) to detect any activation that drifts outside the admissible tube. By aggregating these LAR-adjusted ranks across all layers, TED++ captures how faithfully an input remains on the evolving class submanifolds. Based on such characteristic ``tube-constrained'' behaviour, TED++ flags inputs whose LAR-based ranking sequences deviate significantly. Extensive experiments are conducted on benchmark datasets and tasks, demonstrating that TED++ achieves state-of-the-art detection performance under both adaptive-attack and limited-data scenarios. Remarkably, even with only five held-out examples per class, TED++ still delivers near-perfect detection, achieving gains of up to 14\% in AUROC over the next-best method. The code is publicly available at https://github.com/namle-w/TEDpp.


Key Contributions

  • Tubular-neighbourhood modelling that estimates the 'thickness' of each class's hidden-feature manifold from as few as five clean samples per class
  • Locally Adaptive Ranking (LAR) that flags activations drifting outside the admissible manifold tube, aggregated across all layers into a trajectory-based detection score
  • State-of-the-art backdoor detection under both adaptive-attack and limited-data scenarios, achieving up to 14% AUROC gain over the next-best baseline

🛡️ Threat Analysis

Model Poisoning

TED++ is directly a defense against backdoor/trojan attacks — it detects poisoned inputs with hidden trigger patterns by checking whether their layerwise activations remain faithful to the class submanifold, catching trigger-induced activation drift that standard defenses miss.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timeinference_timetargeteddigital
Datasets
CIFAR-10CIFAR-100ImageNetGTSRB
Applications
image classificationbackdoor input detection