attack arXiv Aug 26, 2025 · Aug 2025
Ridwan Arefeen, Xiaoxiao Miao, Rong Tong et al. · Singapore Institute of Technology · Duke Kunshan University +1 more
Attacks voice anonymization systems by augmenting ASV training data via word-level segment rearrangement to recover speaker identity
Output Integrity Attack audio
Anonymization of voice seeks to conceal the identity of the speaker while maintaining the utility of speech data. However, residual speaker cues often persist, which pose privacy risks. We propose SegReConcat, a data augmentation method for attacker-side enhancement of automatic speaker verification systems. SegReConcat segments anonymized speech at the word level, rearranges segments using random or similarity-based strategies to disrupt long-term contextual cues, and concatenates them with the original utterance, allowing an attacker to learn source speaker traits from multiple perspectives. The proposed method has been evaluated in the VoicePrivacy Attacker Challenge 2024 framework across seven anonymization systems, SegReConcat improves de-anonymization on five out of seven systems.
traditional_ml Singapore Institute of Technology · Duke Kunshan University · NVIDIA
attack arXiv Mar 13, 2026 · 24d ago
Ridwan Arefeen, Xiaoxiao Miao, Rong Tong et al. · Singapore Institute of Technology · Duke Kunshan University +1 more
Dual-stream speaker re-identification attack on anonymized voice using SSL and spectral features with staged transfer learning
Input Manipulation Attack audio
Voice anonymization masks vocal traits while preserving linguistic content, which may still leak speaker-specific patterns. To assess and strengthen privacy evaluation, we propose a dual-stream attacker that fuses spectral and self-supervised learning features via parallel encoders with a three-stage training strategy. Stage I establishes foundational speaker-discriminative representations. Stage II leverages the shared identity-transformation characteristics of voice conversion and anonymization, exposing the model to diverse converted speech to build cross-system robustness. Stage III provides lightweight adaptation to target anonymized data. Results on the VoicePrivacy Attacker Challenge (VPAC) dataset demonstrate that Stage II is the primary driver of generalization, enabling strong attacking performance on unseen anonymization datasets. With Stage III, fine-tuning on only 10\% of the target anonymization dataset surpasses current state-of-the-art attackers in terms of EER.
cnn transformer Singapore Institute of Technology · Duke Kunshan University · NVIDIA