defense 2025

Sparse deepfake detection promotes better disentanglement

Antoine Teissier , Marie Tahon , Nicolas Dugué , Aghilas Sini

0 citations · 21 references · arXiv

α

Published on arXiv

2510.05696

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves EER of 23.36% on ASVSpoof5 test set with 95% sparsity, with sparse representations providing measurably better disentanglement of attack factors in the latent space.

TopK sparse activation on AASIST embeddings

Novel technique introduced


Due to the rapid progress of speech synthesis, deepfake detection has become a major concern in the speech processing community. Because it is a critical task, systems must not only be efficient and robust, but also provide interpretable explanations. Among the different approaches for explainability, we focus on the interpretation of latent representations. In such paper, we focus on the last layer of embeddings of AASIST, a deepfake detection architecture. We use a TopK activation inspired by SAEs on this layer to obtain sparse representations which are used in the decision process. We demonstrate that sparse deepfake detection can improve detection performance, with an EER of 23.36% on ASVSpoof5 test set, with 95% of sparsity. We then show that these representations provide better disentanglement, using completeness and modularity metrics based on mutual information. Notably, some attacks are directly encoded in the latent space.


Key Contributions

  • First application of sparse constraints (TopK activation inspired by Sparse Autoencoders) to audio deepfake detection on the AASIST architecture
  • Empirical evidence that 95% sparsity achieves an EER of 23.36% on ASVSpoof5 while improving latent-space disentanglement
  • Analysis framework using completeness and modularity metrics (mutual information-based) showing attack types are encoded in sparse latent dimensions

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel detection architecture for AI-generated speech (audio deepfakes) by modifying AASIST with TopK sparse activations — a novel forensic/detection technique, not merely applying existing methods to a domain.


Details

Domains
audio
Model Types
gnn
Threat Tags
inference_time
Datasets
ASVSpoof5MLS
Applications
audio deepfake detectionsynthetic speech detection