Latest papers

1 papers
attack arXiv Jan 30, 2026 · 9w ago

Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models

Charles Westphal, Keivan Navaie, Fernando E. Rosas · University College London · ML Alignment Theory Scholars +4 more

Maliciously LoRA-fine-tuned LLMs covertly exfiltrate prompt secrets via geometry-based steganography, detected via linear probes on internal activations

Model Poisoning Sensitive Information Disclosure nlp
PDF