defense 2025

Generalizable Audio Spoofing Detection using Non-Semantic Representations

Arnab Das 1,2, Yassine El Kheir 1,2, Carlos Franzreb 1, Tim Herzig 1,3, Tim Polzehl 1,2, Sebastian Möller 1,3

0 citations · Proc. Interspeech 2025, 4553-4...

α

Published on arXiv

2509.00186

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Significantly outperforms state-of-the-art approaches on out-of-domain (In the Wild) real-world data while achieving comparable in-domain ASVspoof performance

TrillFake

Novel technique introduced


Rapid advancements in generative modeling have made synthetic audio generation easy, making speech-based services vulnerable to spoofing attacks. Consequently, there is a dire need for robust countermeasures more than ever. Existing solutions for deepfake detection are often criticized for lacking generalizability and fail drastically when applied to real-world data. This study proposes a novel method for generalizable spoofing detection leveraging non-semantic universal audio representations. Extensive experiments have been performed to find suitable non-semantic features using TRILL and TRILLsson models. The results indicate that the proposed method achieves comparable performance on the in-domain test set while significantly outperforming state-of-the-art approaches on out-of-domain test sets. Notably, it demonstrates superior generalization on public-domain data, surpassing methods based on hand-crafted features, semantic embeddings, and end-to-end architectures.


Key Contributions

  • Novel use of non-semantic universal audio representations (TRILL and TRILLsson) as features for audio spoofing/deepfake detection, motivated by the insight that discarding semantic content improves generalization
  • Demonstrates superior out-of-domain generalization over SOTA methods based on hand-crafted features, semantic SSL embeddings (XLS-R, WavLM, HuBERT), and end-to-end architectures (RawNet2, AASIST)
  • Cross-dataset evaluation on ASVspoof (in-domain) and In the Wild noisy public-domain data (out-of-domain) showing maintained in-domain competitiveness with significantly improved real-world robustness

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated audio detection method — specifically a new forensic approach leveraging non-semantic representations to detect synthetic/spoofed speech with improved cross-dataset generalization; explicitly falls under deepfake detection and AI-generated content detection.


Details

Domains
audio
Model Types
transformer
Threat Tags
inference_time
Datasets
ASVspoofIn the Wild (ItW)ADD challenge
Applications
audio deepfake detectionspoofing countermeasures for automatic speaker verification