ML Security Papers

Latest papers

3 papers

benchmark arXiv Apr 7, 2026 · 6w ago

Fatih Uenal · University of Colorado Boulder

Benchmark evaluating LLM security across prompt injection, PII extraction, and system prompt leakage for Swiss regulatory compliance

Prompt Injection Sensitive Information Disclosure nlp

defense arXiv Feb 2, 2026 · Feb 2026

Patrick Cooper, Alireza Nadali, Ashutosh Trivedi et al. · University of Colorado Boulder

Enforces monotonicity in Transformer FFN layers to cut LLM adversarial attack success rates from 69% to 19% with minimal performance cost

Input Manipulation Attack Prompt Injection nlp

defense arXiv Dec 24, 2025 · Dec 2025

Sara Taheri, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar et al. · LMU Munich · Technical University of Munich +1 more

Certifies neural network robustness against data poisoning and adversarial attacks using control-theoretic barrier certificates with PAC guarantees

Data Poisoning Attack Input Manipulation Attack vision