Latest papers

3 papers
benchmark arXiv Apr 7, 2026 · 6w ago

Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts

Fatih Uenal · University of Colorado Boulder

Benchmark evaluating LLM security across prompt injection, PII extraction, and system prompt leakage for Swiss regulatory compliance

Prompt Injection Sensitive Information Disclosure nlp
PDF
defense arXiv Feb 2, 2026 · Feb 2026

Monotonicity as an Architectural Bias for Robust Language Models

Patrick Cooper, Alireza Nadali, Ashutosh Trivedi et al. · University of Colorado Boulder

Enforces monotonicity in Transformer FFN layers to cut LLM adversarial attack success rates from 69% to 19% with minimal performance cost

Input Manipulation Attack Prompt Injection nlp
PDF
defense arXiv Dec 24, 2025 · Dec 2025

Robustness Certificates for Neural Networks against Adversarial Attacks

Sara Taheri, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar et al. · LMU Munich · Technical University of Munich +1 more

Certifies neural network robustness against data poisoning and adversarial attacks using control-theoretic barrier certificates with PAC guarantees

Data Poisoning Attack Input Manipulation Attack vision
PDF