ML Security Papers

Latest papers

6 papers

defense arXiv Feb 17, 2026 · 6w ago

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Yuval Felendler, Parth A. Gandhi, Idan Habler et al. · Ben Gurion University of the Negev

Analyzes security of LLM agent MCP code-execution plugins, identifies 16 attack classes including code injection, and proposes sandboxing defenses

Insecure Plugin Design Excessive Agency nlp

PDF Code

defense arXiv Jan 27, 2026 · 9w ago

GAVEL: Towards rule-based safety through activation monitoring

Shir Rozenfeld, Rahul Pankajakshan, Itay Zloczower et al. · Ben Gurion University of the Negev · Amrita Vishwa Vidyapeetham

Rule-based LLM safety framework using interpretable activation-level cognitive elements to detect harmful behaviors with high precision and auditability

Prompt Injection nlp

PDF

defense arXiv Jan 15, 2026 · 11w ago

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Nadya Abaev, Denis Klimov, Gerard Levinov et al. · Ben Gurion University of the Negev

Defends AI agents from malicious inputs and unauthorized tool calls using learned ABAC policies and execution control flow graphs

Excessive Agency Prompt Injection nlp

3 citations PDF

attack arXiv Dec 29, 2025 · Dec 2025

Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

Roee Ziv, Raz Lapid, Moshe Sipper · Ben Gurion University of the Negev · Deepkeep

Universal adversarial audio perturbations attack encoder latent space to hijack audio-LLM outputs without accessing the language model

Input Manipulation Attack Prompt Injection audionlp

PDF

attack arXiv Sep 26, 2025 · Sep 2025

Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN

Roie Kazoom, Alon Goldberg, Hodaya Cohen et al. · Ben Gurion University of the Negev

Conditional GAN with Grad-CAM-guided placement synthesizes targeted adversarial patches achieving 99%+ ASR on CNNs and ViTs in black-box settings

Input Manipulation Attack vision

PDF

defense arXiv Aug 1, 2025 · Aug 2025

Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics

Tom Or, Omri Azencot · Ben Gurion University of the Negev

Detects AI-generated images and audio deepfakes using intermediate-layer features of multi-modal models like CLIP-ViT and ImageBind

Output Integrity Attack visionaudiomultimodal

PDF

Latest papers

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

GAVEL: Towards rule-based safety through activation monitoring

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN

Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue