Latest papers

2 papers
benchmark arXiv Jan 21, 2026 · 10w ago

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Anmol Goel, Cornelius Emde, Sangdoo Yun et al. · Parameter Lab · TU Darmstadt +3 more

Benign fine-tuning silently breaks contextual privacy in LLMs, causing inappropriate data disclosure undetected by standard safety benchmarks

Transfer Learning Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Oct 20, 2025 · Oct 2025

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Asim Mohamed, Martin Gubri · African Institute for Mathematical Sciences · Parameter Lab

Defends LLM text watermarks against translation attacks in low-resource languages via back-translation detection (STEAM)

Output Integrity Attack nlp
PDF