Adversarial Attacks on Audio Deepfake Detection: A Benchmark and Comparative Study
Kutub Uddin 1, Muhammad Umar Farooq 2, Awais Khan 1, Khalid Mahmood Malik 1
Published on arXiv
2509.07132
Input Manipulation Attack
OWASP ML Top 10 — ML01
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Both statistical and optimization-based anti-forensic attacks significantly degrade all 12 state-of-the-art ADD methods across architectures and datasets, with optimization-based attacks (PGD, C&W) proving most damaging to detector performance
The widespread use of generative AI has shown remarkable success in producing highly realistic deepfakes, posing a serious threat to various voice biometric applications, including speaker verification, voice biometrics, audio conferencing, and criminal investigations. To counteract this, several state-of-the-art (SoTA) audio deepfake detection (ADD) methods have been proposed to identify generative AI signatures to distinguish between real and deepfake audio. However, the effectiveness of these methods is severely undermined by anti-forensic (AF) attacks that conceal generative signatures. These AF attacks span a wide range of techniques, including statistical modifications (e.g., pitch shifting, filtering, noise addition, and quantization) and optimization-based attacks (e.g., FGSM, PGD, C \& W, and DeepFool). In this paper, we investigate the SoTA ADD methods and provide a comparative analysis to highlight their effectiveness in exposing deepfake signatures, as well as their vulnerabilities under adversarial conditions. We conducted an extensive evaluation of ADD methods on five deepfake benchmark datasets using two categories: raw and spectrogram-based approaches. This comparative analysis enables a deeper understanding of the strengths and limitations of SoTA ADD methods against diverse AF attacks. It does not only highlight vulnerabilities of ADD methods, but also informs the design of more robust and generalized detectors for real-world voice biometrics. It will further guide future research in developing adaptive defense strategies that can effectively counter evolving AF techniques.
Key Contributions
- First large-scale unified comparative evaluation of 12 SoTA audio deepfake detection methods under both statistical (pitch shifting, noise, filtering) and optimization-based (FGSM, PGD, C&W, DeepFool) anti-forensic attacks
- Cross-architecture and cross-dataset benchmark spanning five diverse corpora (ASVSpoof2019/2021/2024, CodecFake, WaveFake) covering raw-signal and spectrogram-based ADD approaches
- In-depth analysis of ADD strengths and failure modes under adversarial conditions, providing design insights for more robust voice biometric deepfake detectors
🛡️ Threat Analysis
Adversarial attacks (FGSM, PGD, C&W, DeepFool) are applied to ADD classifiers at inference time — perturbing deepfake audio inputs to cause misclassification as genuine speech, which is a textbook adversarial evasion attack on ML models.
The systems being attacked are audio deepfake detectors — AI-generated content detection tools — and the study evaluates how anti-forensic attacks compromise their output integrity in voice biometric applications.