attack arXiv Sep 25, 2025 · Sep 2025
Rostislav Makarov, Lea Schönherr, Timo Gerkmann · University of Hamburg · CISPA Helmholtz Center for Information Security
Proposes targeted white-box adversarial attacks on speech enhancement models that psychoacoustically hide perturbations to alter output semantics
Input Manipulation Attack audio
Machine learning approaches for speech enhancement are becoming increasingly expressive, enabling ever more powerful modifications of input signals. In this paper, we demonstrate that this expressiveness introduces a vulnerability: advanced speech enhancement models can be susceptible to adversarial attacks. Specifically, we show that adversarial noise, carefully crafted and psychoacoustically masked by the original input, can be injected such that the enhanced speech output conveys an entirely different semantic meaning. We experimentally verify that contemporary predictive speech enhancement models can indeed be manipulated in this way. Furthermore, we highlight that diffusion models with stochastic samplers exhibit inherent robustness to such adversarial attacks by design.
diffusion cnn University of Hamburg · CISPA Helmholtz Center for Information Security