ML Security Papers

attack arXiv Jan 30, 2026 · 9w ago

Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models

Charles Westphal, Keivan Navaie, Fernando E. Rosas · University College London · ML Alignment Theory Scholars +4 more

Maliciously LoRA-fine-tuned LLMs covertly exfiltrate prompt secrets via geometry-based steganography, detected via linear probes on internal activations

Model Poisoning Sensitive Information Disclosure nlp

PDF

Latest papers

Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue