defense 2026

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

0 citations

Published on arXiv

2603.08179

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Anon-W2F raises speaker verification EER by 3.5× (11.2% → 41.0%), approaching the 50% random-chance ceiling, with first-response latency under 0.8 s

Anon-W2F / Anon-W2W (Stream-Voice-Anon)

Novel technique introduced

End-to-end full-duplex speech models feed user audio through an always-on LLM backbone, yet the speaker privacy implications of their hidden representations remain unexamined. Following the VoicePrivacy 2024 protocol with a lazy-informed attacker, we show that the hidden states of SALM-Duplex and Moshi leak substantial speaker identity across all transformer layers. Layer-wise and turn-wise analyses reveal that leakage persists across all layers, with SALM-Duplex showing stronger leakage in early layers while Moshi leaks uniformly, and that Linkability rises sharply within the first few turns. We propose two streaming anonymization setups using Stream-Voice-Anon: a waveform-level front-end (Anon-W2W) and a feature-domain replacement (Anon-W2F). Anon-W2F raises EER by over 3.5x relative to the discrete encoder baseline (11.2% to 41.0%), approaching the 50% random-chance ceiling, while Anon-W2W retains 78-93% of baseline sBERT across setups with sub-second response latency (FRL under 0.8 s).

Key Contributions

First empirical characterization of speaker identity leakage in E2E full-duplex LLM speech models (SALM-Duplex, Moshi) showing leakage persists across all transformer layers with near-perfect attacker EER as low as 6.4% for Moshi
Layer-wise and turn-wise leakage analysis revealing Linkability rises sharply within the first few dialogue turns, posing GDPR compliance risks
Two streaming anonymization setups (Anon-W2W waveform-level and Anon-W2F feature-domain) that raise EER from 11.2% to 41.0% while retaining 78–93% dialogue utility at sub-second latency

🛡️ Threat Analysis

Details

Domains

audionlp

Model Types

llmtransformer

Threat Tags

grey_boxinference_time

Datasets

VoicePrivacy 2024

Applications

speech dialogue systemsvoice assistants

Read PDF arXiv

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors