A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems

Voice authentication has undergone significant changes from traditional systems that relied on handcrafted acoustic features to deep learning models that can extract robust speaker embeddings. This advancement has expanded its applications across finance, smart devices, law enforcement, and beyond. However, as adoption has grown, so have the threats. This survey presents a comprehensive review of the modern threat landscape targeting Voice Authentication Systems (VAS) and Anti-Spoofing Countermeasures (CMs), including data poisoning, adversarial, deepfake, and adversarial spoofing attacks. We chronologically trace the development of voice authentication and examine how vulnerabilities have evolved in tandem with technological advancements. For each category of attack, we summarize methodologies, highlight commonly used datasets, compare performance and limitations, and organize existing literature using widely accepted taxonomies. By highlighting emerging risks and open challenges, this survey aims to support the development of more secure and resilient voice authentication systems.

Key Contributions

Chronological review tracing voice authentication evolution alongside its threat landscape, covering data poisoning, adversarial, deepfake, and adversarial anti-spoofing attack categories
Per-category taxonomies summarizing methodologies, commonly used datasets, and performance comparisons across the literature
Identification of emerging risks and open challenges to guide development of more secure and resilient voice authentication systems

🛡️ Threat Analysis

Input Manipulation Attack

Covers adversarial attacks against speaker verification models (evasion at inference time) and a dedicated section on adversarial attacks targeting anti-spoofing countermeasures.

Data Poisoning Attack

Dedicated section on data poisoning attacks against voice authentication models — corrupting training data to degrade or manipulate speaker verification behavior.

Output Integrity Attack

Covers deepfake/synthetic speech generation attacks and the anti-spoofing countermeasures that detect AI-generated voice content, directly addressing audio content integrity and provenance.