I Guess That's Why They Call it the Blues: Causal Analysis for Audio Classifiers
David A. Kelly , Hana Chockler
Published on arXiv
2601.16675
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Modifying a single frequency out of ~240,000 changes audio classifier output 58% of the time; modifying 5 frequencies achieves 78% success, and inaudible STFT-based perturbations succeed 62% of the time.
FreqReX
Novel technique introduced
It is well-known that audio classifiers often rely on non-musically relevant features and spurious correlations to classify audio. Hence audio classifiers are easy to manipulate or confuse, resulting in wrong classifications. While inducing a misclassification is not hard, until now the set of features that the classifiers rely on was not well understood. In this paper we introduce a new method that uses causal reasoning to discover features of the frequency space that are sufficient and necessary for a given classification. We describe an implementation of this algorithm in the tool FreqReX and provide experimental results on a number of standard benchmark datasets. Our experiments show that causally sufficient and necessary subsets allow us to manipulate the outputs of the models in a variety of ways by changing the input very slightly. Namely, a change to one out of 240,000 frequencies results in a change in classification 58% of the time, and the change can be so small that it is practically inaudible. These results show that causal analysis is useful for understanding the reasoning process of audio classifiers and can be used to successfully manipulate their outputs.
Key Contributions
- FreqReX: a causal-reasoning framework that decomposes audio signals into sufficient, necessary, and complete frequency subsets with respect to a classifier's decision
- Demonstrates that minimal frequency perturbations (as few as 1 out of ~240,000) can change audio classifier output 58% of the time, with STFT variants yielding practically inaudible changes effective 62% of the time
- First application of formal actual-causality theory to audio signals, evaluated across 8 models on music genre and sung emotion classification datasets
🛡️ Threat Analysis
FreqReX generates adversarial perturbations at inference time — changing as few as 1 out of 240,000 frequencies can flip classification 58% of the time, with STFT-based changes small enough to be inaudible. The causal-subset approach to finding minimal adversarial perturbations is a novel black-box attack technique on audio classifiers.