ML Security Papers

Latest papers

7 papers

benchmark arXiv Apr 25, 2026 · 26d ago

Evaluating Jailbreaking Vulnerabilities in LLMs Deployed as Assistants for Smart Grid Operations: A Benchmark Against NERC Standards

Taha Hammadia, Lucas Rea, Ahmad Mohammad Saber et al. · University of Toronto · Concordia University

Benchmarks three jailbreak attacks against LLMs used for smart grid compliance, finding 33% overall success rate with DeepInception most effective

Prompt Injection nlp

PDF

defense arXiv Mar 30, 2026 · 7w ago

FL-PBM: Pre-Training Backdoor Mitigation for Federated Learning

Osama Wehbi, Sarhad Arisdakessian, Omar Abdel Wahab et al. · Polytechnique Montréal · Khalifa University +2 more

Client-side defense that detects and blurs backdoored training data in federated learning using PCA and GMM clustering

Model Poisoning visionfederated-learning

PDF

defense arXiv Mar 5, 2026 · 11w ago

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Zhihao Li, Gezheng Xu, Jiale Cai et al. · Western University · Concordia University +2 more

Proposes BAIT, a bi-level optimization that makes availability-poisoning data protection robust against pretrained model fine-tuning

Data Poisoning Attack vision

PDF Code

attack arXiv Mar 4, 2026 · 11w ago

Efficient Refusal Ablation in LLM through Optimal Transport

Geraldin Nanfack, Eugene Belilovsky, Elvis Dohmatob · Concordia University · Mila – Québec AI Institute

Optimal transport attack transforms harmful LLM activation distributions to match harmless ones, achieving 11% higher jailbreak success than refusal-direction ablation baselines.

Prompt Injection nlp

PDF

defense arXiv Dec 6, 2025 · Dec 2025

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel et al. · Polytechnique Montréal · Concordia University +1 more

Defends LLM tool-use via MCP against tool-descriptor poisoning, shadowing, and rug-pull attacks using RSA signing and LLM-on-LLM vetting

Insecure Plugin Design Prompt Injection nlp

5 citations PDF

defense Neural computing & application... Oct 29, 2025 · Oct 2025

Fixed-point graph convolutional networks against adversarial attacks

Shakib Khan, A. Ben Hamza, Amr Youssef · Concordia University

Defends GNNs against adversarial graph perturbations via fixed-point iteration and spectral high-frequency attenuation filtering

Input Manipulation Attack Data Poisoning Attack graph

PDF

attack arXiv Aug 12, 2025 · Aug 2025

Constrained Black-Box Attacks Against Cooperative Multi-Agent Reinforcement Learning

Amine Andam, Jamal Bentahar, Mustapha Hedabou · Mohammed VI Polytechnic University · Khalifa University +1 more

Black-box observation perturbation attacks disrupt cooperative MARL via agent-view misalignment using only 1,000 samples

Input Manipulation Attack reinforcement-learning

PDF

Latest papers

Evaluating Jailbreaking Vulnerabilities in LLMs Deployed as Assistants for Smart Grid Operations: A Benchmark Against NERC Standards

FL-PBM: Pre-Training Backdoor Mitigation for Federated Learning

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Efficient Refusal Ablation in LLM through Optimal Transport

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Fixed-point graph convolutional networks against adversarial attacks

Constrained Black-Box Attacks Against Cooperative Multi-Agent Reinforcement Learning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue