ML Security Papers

Latest papers

2 papers

attack arXiv Feb 4, 2026 · 8w ago

Amir Nuriyev, Gabriel Kulp · MBZUAI · RAND +1 more

Reconstructs user input text from MoE routing decisions alone, achieving 91.2% token recovery via a transformer decoder

Model Inversion Attack Sensitive Information Disclosure nlp

defense arXiv Nov 11, 2025 · Nov 2025

Shourya Batra, Pierce Tillman, Samarth Gaggar et al. · Independent · Algoverse +3 more

Activation steering defense that reduces sensitive user data leakage in LLM chain-of-thought reasoning traces at inference time

Sensitive Information Disclosure nlp

4 citations 1 influentialPDF