From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutralization
Farah Wahida 1,2, M.A.P. Chamikara 2, Yashothara Shanmugarasa 2,3, Mohan Baruwal Chhetri 2, Thilina Ranbaduge 2, Ibrahim Khalil 1
Published on arXiv
2508.05409
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
TrueBiometric achieves 100% accuracy in detecting and correcting backdoor-poisoned images in face recognition systems without degrading clean image accuracy
TrueBiometric
Novel technique introduced
Biometric systems, such as face recognition systems powered by deep neural networks (DNNs), rely on large and highly sensitive datasets. Backdoor attacks can subvert these systems by manipulating the training process. By inserting a small trigger, such as a sticker, make-up, or patterned mask, into a few training images, an adversary can later present the same trigger during authentication to be falsely recognized as another individual, thereby gaining unauthorized access. Existing defense mechanisms against backdoor attacks still face challenges in precisely identifying and mitigating poisoned images without compromising data utility, which undermines the overall reliability of the system. We propose a novel and generalizable approach, TrueBiometric: Trustworthy Biometrics, which accurately detects poisoned images using a majority voting mechanism leveraging multiple state-of-the-art large vision language models. Once identified, poisoned samples are corrected using targeted and calibrated corrective noise. Our extensive empirical results demonstrate that TrueBiometric detects and corrects poisoned images with 100\% accuracy without compromising accuracy on clean images. Compared to existing state-of-the-art approaches, TrueBiometric offers a more practical, accurate, and effective solution for mitigating backdoor attacks in face recognition systems.
Key Contributions
- Majority voting mechanism over multiple large VLMs to accurately detect poisoned (backdoor-triggered) training images in face recognition datasets
- Noise-based corrective neutralization that corrects detected poisoned samples without requiring retraining or degrading clean-image accuracy
- Empirical demonstration of 100% detection and correction accuracy on backdoor-poisoned face images across multiple attack types including MakeupAttack
🛡️ Threat Analysis
Directly defends against backdoor/trojan attacks in face recognition DNNs — the paper proposes trigger detection via VLM majority voting and corrective noise to neutralize poisoned training samples with trigger patterns (stickers, makeup, patterned masks).