ML Security Papers

Latest papers

3 papers

defense arXiv Feb 4, 2026 · 8w ago

Cascading Robustness Verification: Toward Efficient Model-Agnostic Certification

Mohammadreza Maleki, Rushendra Sidibomma, Arman Adibi et al. · Toronto Metropolitan University · University of Minnesota Twin-Cities +2 more

Cascading verifier framework certifies neural network robustness against adversarial examples with 90% runtime reduction over single-verifier baselines

Input Manipulation Attack vision

PDF

Certifying neural network robustness against adversarial examples is challenging, as formal guarantees often require solving non-convex problems. Hence, incomplete verifiers are widely used because they scale efficiently and substantially reduce the cost of robustness verification compared to complete methods. However, relying on a single verifier can underestimate robustness because of loose approximations or misalignment with training methods. In this work, we propose Cascading Robustness Verification (CRV), which goes beyond an engineering improvement by exposing fundamental limitations of existing robustness metric and introducing a framework that enhances both reliability and efficiency. CRV is a model-agnostic verifier, meaning that its robustness guarantees are independent of the model's training process. The key insight behind the CRV framework is that, when using multiple verification methods, an input is certifiably robust if at least one method certifies it as robust. Rather than relying solely on a single verifier with a fixed constraint set, CRV progressively applies multiple verifiers to balance the tightness of the bound and computational cost. Starting with the least expensive method, CRV halts as soon as an input is certified as robust; otherwise, it proceeds to more expensive methods. For computationally expensive methods, we introduce a Stepwise Relaxation Algorithm (SR) that incrementally adds constraints and checks for certification at each step, thereby avoiding unnecessary computation. Our theoretical analysis demonstrates that CRV achieves equal or higher verified accuracy compared to powerful but computationally expensive incomplete verifiers in the cascade, while significantly reducing verification overhead. Empirical results confirm that CRV certifies at least as many inputs as benchmark approaches, while improving runtime efficiency by up to ~90%.

cnn transformer Toronto Metropolitan University · University of Minnesota Twin-Cities · Augusta University +1 more

PDF arXiv DOI

benchmark arXiv Oct 6, 2025 · Oct 2025

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Punya Syon Pandey, Hai Son Le, Devansh Bhardwaj et al. · University of Toronto · Vector Institute +4 more

Benchmarks LLM vulnerability to sociopolitical harm requests across 585 prompts, 34 countries, revealing 97–98% attack success rates

Prompt Injection nlp

PDF Code

benchmark arXiv Sep 4, 2025 · Sep 2025

Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain

Shakiba Amirshahi, Amin Bigdeli, Charles L. A. Clarke et al. · University of Waterloo · Toronto Metropolitan University

Benchmarks RAG vulnerability to adversarial health misinformation documents, finding co-present helpful evidence preserves alignment

Input Manipulation Attack Prompt Injection nlp

PDF Code

Latest papers

Cascading Robustness Verification: Toward Efficient Model-Agnostic Certification

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue