Multi-Target Backdoor Attacks Against Speaker Recognition

In this work, we propose a multi-target backdoor attack against speaker identification using position-independent clicking sounds as triggers. Unlike previous single-target approaches, our method targets up to 50 speakers simultaneously, achieving success rates of up to 95.04%. To simulate more realistic attack conditions, we vary the signal-to-noise ratio between speech and trigger, demonstrating a trade-off between stealth and effectiveness. We further extend the attack to the speaker verification task by selecting the most similar training speaker - based on cosine similarity - as a proxy target. The attack is most effective when target and enrolled speaker pairs are highly similar, reaching success rates of up to 90% in such cases.

Key Contributions

First multi-target backdoor attack against speaker identification, simultaneously targeting up to 50 speakers with a single poisoned model achieving up to 95.04% attack success rate
Position-independent clicking-sound triggers with variable SNR to balance stealth and effectiveness in realistic conditions
Extension of the attack to speaker verification via cosine-similarity-based proxy target selection, reaching up to 90% success when target and enrolled speaker are highly similar

🛡️ Threat Analysis

Model Poisoning

Proposes backdoor injection via dirty-label data poisoning into speaker recognition models: specific clicking-sound triggers activate targeted misclassification (impersonating a chosen speaker) while the model behaves normally on clean inputs — the canonical backdoor/trojan attack definition.