Adversarial Attacks on Audio Deepfake Detection: A Benchmark and Comparative Study

The widespread use of generative AI has shown remarkable success in producing highly realistic deepfakes, posing a serious threat to various voice biometric applications, including speaker verification, voice biometrics, audio conferencing, and criminal investigations. To counteract this, several state-of-the-art (SoTA) audio deepfake detection (ADD) methods have been proposed to identify generative AI signatures to distinguish between real and deepfake audio. However, the effectiveness of these methods is severely undermined by anti-forensic (AF) attacks that conceal generative signatures. These AF attacks span a wide range of techniques, including statistical modifications (e.g., pitch shifting, filtering, noise addition, and quantization) and optimization-based attacks (e.g., FGSM, PGD, C \& W, and DeepFool). In this paper, we investigate the SoTA ADD methods and provide a comparative analysis to highlight their effectiveness in exposing deepfake signatures, as well as their vulnerabilities under adversarial conditions. We conducted an extensive evaluation of ADD methods on five deepfake benchmark datasets using two categories: raw and spectrogram-based approaches. This comparative analysis enables a deeper understanding of the strengths and limitations of SoTA ADD methods against diverse AF attacks. It does not only highlight vulnerabilities of ADD methods, but also informs the design of more robust and generalized detectors for real-world voice biometrics. It will further guide future research in developing adaptive defense strategies that can effectively counter evolving AF techniques.

Key Contributions

First large-scale unified comparative evaluation of 12 SoTA audio deepfake detection methods under both statistical (pitch shifting, noise, filtering) and optimization-based (FGSM, PGD, C&W, DeepFool) anti-forensic attacks
Cross-architecture and cross-dataset benchmark spanning five diverse corpora (ASVSpoof2019/2021/2024, CodecFake, WaveFake) covering raw-signal and spectrogram-based ADD approaches
In-depth analysis of ADD strengths and failure modes under adversarial conditions, providing design insights for more robust voice biometric deepfake detectors

🛡️ Threat Analysis

Input Manipulation Attack

Adversarial attacks (FGSM, PGD, C&W, DeepFool) are applied to ADD classifiers at inference time — perturbing deepfake audio inputs to cause misclassification as genuine speech, which is a textbook adversarial evasion attack on ML models.

Output Integrity Attack

The systems being attacked are audio deepfake detectors — AI-generated content detection tools — and the study evaluates how anti-forensic attacks compromise their output integrity in voice biometric applications.