benchmark 2026

Blue Teaming Function-Calling Agents

Greta Dolcetti 1, Giulio Zizzo 2, Sergio Maffeis 3

0 citations · 8 references · arXiv

α

Published on arXiv

2601.09292

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

All four open-source LLMs are unsafe by default against tool poisoning and prompt injection attacks, and none of the eight tested defenses achieves acceptable real-world performance due to high false-positive rates.

Renaming Tool Poisoning

Novel technique introduced


We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.


Key Contributions

  • Systematic empirical evaluation of three attacks (Direct Prompt Injection, Simple Tool Poisoning, Renaming Tool Poisoning) against four open-source function-calling LLMs using 172 BFCL samples
  • Introduction of Renaming Tool Poisoning, a novel attack exploiting tool implementation visibility in open-source settings, and a corresponding Tool Obfuscation defense
  • Quantitative assessment of eight defenses showing that all current mechanisms suffer from high false-positive rates and are not yet viable for real-world deployment

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
grey_boxinference_time
Datasets
Berkeley Function Calling Leaderboard (BFCL_v3_multiple)
Applications
function-calling agentsagentic ai systemsllm tool use