ML Security Papers

Latest papers

3 papers

defense arXiv Mar 3, 2026 · 5w ago

Cullen Anderson, Narmeen Oozeer, Foad Namjoo et al. · University of Massachusetts Amherst · Martian AI +2 more

Analyzes adversarial data poisoning of LLM contrastive steering datasets and defends with robust mean estimation

Data Poisoning Attack Training Data Poisoning nlp

defense arXiv Nov 24, 2025 · Nov 2025

Steven Peh · Thoughtworks

Cryptographic prompt signing defense reduces LLM prompt injection success rates from 86.7% to 0% across 300 attacks

Prompt Injection nlp

1 citations PDF

attack arXiv Sep 7, 2025 · Sep 2025

Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah et al. · Singapore University of Technology and Design · Nanyang Technological University +2 more

Ablates SAE latent features mediating refusal in LLMs to produce mechanistically-grounded jailbreaks via a three-stage pipeline

Prompt Injection nlp