ML Security Papers

Latest papers

2 papers

benchmark arXiv Jan 21, 2026 · 10w ago

Anmol Goel, Cornelius Emde, Sangdoo Yun et al. · Parameter Lab · TU Darmstadt +3 more

Benign fine-tuning silently breaks contextual privacy in LLMs, causing inappropriate data disclosure undetected by standard safety benchmarks

Transfer Learning Attack Sensitive Information Disclosure nlp

defense arXiv Oct 20, 2025 · Oct 2025

Asim Mohamed, Martin Gubri · African Institute for Mathematical Sciences · Parameter Lab

Defends LLM text watermarks against translation attacks in low-resource languages via back-translation detection (STEAM)

Output Integrity Attack nlp