ML Security Papers

benchmark 2026

Blue Teaming Function-Calling Agents

Greta Dolcetti ¹, Giulio Zizzo ², Sergio Maffeis ³

¹ Ca’ Foscari University of Venice

² IBM Research

³ Imperial College London

0 citations · 8 references · arXiv

α

Published on arXiv

2601.09292

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

All four open-source LLMs are unsafe by default against tool poisoning and prompt injection attacks, and none of the eight tested defenses achieves acceptable real-world performance due to high false-positive rates.

Renaming Tool Poisoning

Novel technique introduced

We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.

Key Contributions

Systematic empirical evaluation of three attacks (Direct Prompt Injection, Simple Tool Poisoning, Renaming Tool Poisoning) against four open-source function-calling LLMs using 172 BFCL samples
Introduction of Renaming Tool Poisoning, a novel attack exploiting tool implementation visibility in open-source settings, and a corresponding Tool Obfuscation defense
Quantitative assessment of eight defenses showing that all current mechanisms suffer from high false-positive rates and are not yet viable for real-world deployment

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

grey_boxinference_time

Datasets

Berkeley Function Calling Leaderboard (BFCL_v3_multiple)

Applications

function-calling agentsagentic ai systemsllm tool use

Read PDF arXiv DOI

Similar Papers

Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning

Are AI-assisted Development Tools Immune to Prompt Injection?

Systematic Analysis of MCP Security

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Quantifying Distributional Robustness of Agentic Tool-Selection

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents