XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection

Recent advancements in speech synthesis technologies have led to increasingly sophisticated spoofing attacks, posing significant challenges for automatic speaker verification systems. While systems based on self-supervised learning (SSL) models, particularly the XLSR-Conformer architecture, have demonstrated remarkable performance in synthetic speech detection, there remains room for architectural improvements. In this paper, we propose a novel approach that replaces the traditional Multi-Layer Perceptron (MLP) in the XLSR-Conformer model with a Kolmogorov-Arnold Network (KAN), a powerful universal approximator based on the Kolmogorov-Arnold representation theorem. Our experimental results on ASVspoof2021 demonstrate that the integration of KAN to XLSR-Conformer model can improve the performance by 60.55% relatively in Equal Error Rate (EER) LA and DF sets, further achieving 0.70% EER on the 21LA set. Besides, the proposed replacement is also robust to various SSL architectures. These findings suggest that incorporating KAN into SSL-based models is a promising direction for advances in synthetic speech detection.

Key Contributions

Proposes XLSR-Kanformer: replaces MLP layers in the XLSR-Conformer with Kolmogorov-Arnold Networks (KANs) for improved feature learning from SSL representations
Achieves 60.55% relative EER improvement on ASVspoof2021 LA and DF sets, reaching 0.70% EER on 21LA (new SOTA on DF set)
Demonstrates that KAN replacement is robust across multiple SSL backbone architectures

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel detection architecture (KAN-augmented conformer) for identifying AI-generated/synthetic speech — directly addresses output integrity by authenticating whether audio is genuine or machine-generated. This is a novel architectural contribution to deepfake audio detection, not a simple domain application of existing methods.

Details

Domains

audio

Model Types

transformer

Threat Tags

inference_time

Datasets

ASVspoof2021

Applications

2025 0 cit.

Output Integrity Attack

100%