Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning
Artem Dvirniak 1, Evgeny Kushnir 2,3,4, Dmitrii Tarasov 3,4, Artem Iudin 5, Oleg Kiriukhin 5, Mikhail Pautov 2,6, Dmitrii Korzh 2,4,5, Oleg Y. Rogov 2,4,5
Published on arXiv
2603.10725
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
HIR-SDD achieves competitive countermeasure performance while producing human-interpretable chain-of-thought justifications that ground deepfake attribution decisions in perceptible acoustic cues
HIR-SDD
Novel technique introduced
The modern generative audio models can be used by an adversary in an unlawful manner, specifically, to impersonate other people to gain access to private information. To mitigate this issue, speech deepfake detection (SDD) methods started to evolve. Unfortunately, current SDD methods generally suffer from the lack of generalization to new audio domains and generators. More than that, they lack interpretability, especially human-like reasoning that would naturally explain the attribution of a given audio to the bona fide or spoof class and provide human-perceptible cues. In this paper, we propose HIR-SDD, a novel SDD framework that combines the strengths of Large Audio Language Models (LALMs) with the chain-of-thought reasoning derived from the novel proposed human-annotated dataset. Experimental evaluation demonstrates both the effectiveness of the proposed method and its ability to provide reasonable justifications for predictions.
Key Contributions
- Human-annotated dataset of 41k reasoning traces covering bona fide and spoof speech samples for CoT training and evaluation
- HIR-SDD framework combining hard-label classification and chain-of-thought supervised fine-tuning with LALM-based reasoning for interpretable speech deepfake detection
- Integration of grounding and reinforcement learning strategies to improve both detection accuracy and quality of human-perceptible explanations
🛡️ Threat Analysis
Speech deepfake detection is AI-generated content detection; the paper proposes a novel detection architecture (HIR-SDD) using LALMs and CoT reasoning to identify synthetic/spoofed speech — this is a novel forensic detection approach for AI-generated audio, not a mere application of existing methods to a narrow domain.