Blue Teaming Function-Calling Agents
Greta Dolcetti 1, Giulio Zizzo 2, Sergio Maffeis 3
Published on arXiv
2601.09292
Prompt Injection
OWASP LLM Top 10 — LLM01
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Key Finding
All four open-source LLMs are unsafe by default against tool poisoning and prompt injection attacks, and none of the eight tested defenses achieves acceptable real-world performance due to high false-positive rates.
Renaming Tool Poisoning
Novel technique introduced
We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.
Key Contributions
- Systematic empirical evaluation of three attacks (Direct Prompt Injection, Simple Tool Poisoning, Renaming Tool Poisoning) against four open-source function-calling LLMs using 172 BFCL samples
- Introduction of Renaming Tool Poisoning, a novel attack exploiting tool implementation visibility in open-source settings, and a corresponding Tool Obfuscation defense
- Quantitative assessment of eight defenses showing that all current mechanisms suffer from high false-positive rates and are not yet viable for real-world deployment