Latest papers

2 papers
defense arXiv Feb 2, 2026 · 9w ago

Monotonicity as an Architectural Bias for Robust Language Models

Patrick Cooper, Alireza Nadali, Ashutosh Trivedi et al. · University of Colorado Boulder

Enforces monotonicity in Transformer FFN layers to cut LLM adversarial attack success rates from 69% to 19% with minimal performance cost

Input Manipulation Attack Prompt Injection nlp
PDF
defense arXiv Dec 24, 2025 · Dec 2025

Robustness Certificates for Neural Networks against Adversarial Attacks

Sara Taheri, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar et al. · LMU Munich · Technical University of Munich +1 more

Certifies neural network robustness against data poisoning and adversarial attacks using control-theoretic barrier certificates with PAC guarantees

Data Poisoning Attack Input Manipulation Attack vision
PDF