ML Security Papers

Latest papers

2 papers

attack arXiv Oct 24, 2025 · Oct 2025

Pavlos Ntais · University of Athens

Trains compact LoRA-tuned Mistral-7B to auto-generate narrative jailbreaks, achieving 81% ASR against GPT-OSS-20B and 66.5% against GPT-4

Prompt Injection nlp

1 citations PDF

attack arXiv Oct 17, 2025 · Oct 2025

Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi et al. · EPFL · Archimedes/Athena RC +3 more

Proves LLMs are injective and introduces SipIt to exactly reconstruct private input text from hidden activations

Model Inversion Attack Sensitive Information Disclosure nlp

15 citations 3 influentialPDF