Latest papers

3 papers
defense arXiv Feb 4, 2026 · 8w ago

Cascading Robustness Verification: Toward Efficient Model-Agnostic Certification

Mohammadreza Maleki, Rushendra Sidibomma, Arman Adibi et al. · Toronto Metropolitan University · University of Minnesota Twin-Cities +2 more

Cascading verifier framework certifies neural network robustness against adversarial examples with 90% runtime reduction over single-verifier baselines

Input Manipulation Attack vision
PDF
benchmark arXiv Oct 6, 2025 · Oct 2025

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Punya Syon Pandey, Hai Son Le, Devansh Bhardwaj et al. · University of Toronto · Vector Institute +4 more

Benchmarks LLM vulnerability to sociopolitical harm requests across 585 prompts, 34 countries, revealing 97–98% attack success rates

Prompt Injection nlp
PDF Code
benchmark arXiv Sep 4, 2025 · Sep 2025

Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain

Shakiba Amirshahi, Amin Bigdeli, Charles L. A. Clarke et al. · University of Waterloo · Toronto Metropolitan University

Benchmarks RAG vulnerability to adversarial health misinformation documents, finding co-present helpful evidence preserves alignment

Input Manipulation Attack Prompt Injection nlp
PDF Code