Reliable Audio Deepfake Detection in Variable Conditions via Quantum-Kernel SVMs
Lisan Al Amin , Vandana P. Janeja
Published on arXiv
2512.18797
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
QSVMs reduce false-positive rates by 13–57% over classical SVMs across all four evaluation corpora, with EER improving from 0.299 to 0.183 on ASVspoof 5 and 0.188 to 0.081 on ADD23.
QSVM (Quantum-Kernel SVM)
Novel technique introduced
Detecting synthetic speech is challenging when labeled data are scarce and recording conditions vary. Existing end-to-end deep models often overfit or fail to generalize, and while kernel methods can remain competitive, their performance heavily depends on the chosen kernel. Here, we show that using a quantum kernel in audio deepfake detection reduces falsepositive rates without increasing model size. Quantum feature maps embed data into high-dimensional Hilbert spaces, enabling the use of expressive similarity measures and compact classifiers. Building on this motivation, we compare quantum-kernel SVMs (QSVMs) with classical SVMs using identical mel-spectrogram preprocessing and stratified 5-fold cross-validation across four corpora (ASVspoof 2019 LA, ASVspoof 5 (2024), ADD23, and an In-the-Wild set). QSVMs achieve consistently lower equalerror rates (EER): 0.183 vs. 0.299 on ASVspoof 5 (2024), 0.081 vs. 0.188 on ADD23, 0.346 vs. 0.399 on ASVspoof 2019, and 0.355 vs. 0.413 In-the-Wild. At the EER operating point (where FPR equals FNR), these correspond to absolute false-positiverate reductions of 0.116 (38.8%), 0.107 (56.9%), 0.053 (13.3%), and 0.058 (14.0%), respectively. We also report how consistent the results are across cross-validation folds and margin-based measures of class separation, using identical settings for both models. The only modification is the kernel; the features and SVM remain unchanged, no additional trainable parameters are introduced, and the quantum kernel is computed on a conventional computer.
Key Contributions
- Demonstrates that quantum-kernel SVMs (QSVMs) consistently achieve lower equal-error rates than classical SVMs for audio deepfake detection across four corpora without increasing model size or trainable parameters.
- Provides controlled comparison isolating the kernel's effect by keeping features, SVM solver, preprocessing, and cross-validation folds identical between QSVM and classical SVM.
- Reports absolute false-positive-rate reductions of up to 56.9% (ADD23) and 38.8% (ASVspoof 5) at the EER operating point using mel-spectrogram features and a classically-simulated quantum kernel.
🛡️ Threat Analysis
The paper's primary contribution is a detection method for AI-generated audio content (synthetic speech / audio deepfakes), explicitly targeting output integrity and content authenticity — a core ML09 concern. QSVMs are proposed as a novel detection architecture rather than a routine application of existing detectors.