defense 2026

Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning

Artem Dvirniak 1, Evgeny Kushnir 2,3,4, Dmitrii Tarasov 3,4, Artem Iudin 5, Oleg Kiriukhin 5, Mikhail Pautov 2,6, Dmitrii Korzh 2,4,5, Oleg Y. Rogov 2,4,5

0 citations

α

Published on arXiv

2603.10725

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

HIR-SDD achieves competitive countermeasure performance while producing human-interpretable chain-of-thought justifications that ground deepfake attribution decisions in perceptible acoustic cues

HIR-SDD

Novel technique introduced


The modern generative audio models can be used by an adversary in an unlawful manner, specifically, to impersonate other people to gain access to private information. To mitigate this issue, speech deepfake detection (SDD) methods started to evolve. Unfortunately, current SDD methods generally suffer from the lack of generalization to new audio domains and generators. More than that, they lack interpretability, especially human-like reasoning that would naturally explain the attribution of a given audio to the bona fide or spoof class and provide human-perceptible cues. In this paper, we propose HIR-SDD, a novel SDD framework that combines the strengths of Large Audio Language Models (LALMs) with the chain-of-thought reasoning derived from the novel proposed human-annotated dataset. Experimental evaluation demonstrates both the effectiveness of the proposed method and its ability to provide reasonable justifications for predictions.


Key Contributions

  • Human-annotated dataset of 41k reasoning traces covering bona fide and spoof speech samples for CoT training and evaluation
  • HIR-SDD framework combining hard-label classification and chain-of-thought supervised fine-tuning with LALM-based reasoning for interpretable speech deepfake detection
  • Integration of grounding and reinforcement learning strategies to improve both detection accuracy and quality of human-perceptible explanations

🛡️ Threat Analysis

Output Integrity Attack

Speech deepfake detection is AI-generated content detection; the paper proposes a novel detection architecture (HIR-SDD) using LALMs and CoT reasoning to identify synthetic/spoofed speech — this is a novel forensic detection approach for AI-generated audio, not a mere application of existing methods to a narrow domain.


Details

Domains
audionlp
Model Types
llmtransformer
Threat Tags
inference_time
Datasets
ASVspoofASVspoof5ADDSingfake
Applications
speech deepfake detectionvoice anti-spoofingspeaker verification