benchmark 2025

Quantifying Distributional Robustness of Agentic Tool-Selection

Jehyeok Yeon , Isha Chaudhary , Gagandeep Singh

University of Illinois Urbana-Champaign

3 citations · 40 references · arXiv

Published on arXiv

2510.03992

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Certified accuracy bound drops near zero under adversarial tool injection (>60% average performance degradation), and falls below 20% after a single round of adaptive refinement targeting both retrieval and selection stages.

ToolCert

Novel technique introduced

Large language models (LLMs) are increasingly deployed in agentic systems where they map user intents to relevant external tools to fulfill a task. A critical step in this process is tool selection, where a retriever first surfaces candidate tools from a larger pool, after which the LLM selects the most appropriate one. This pipeline presents an underexplored attack surface where errors in selection can lead to severe outcomes like unauthorized data access or denial of service, all without modifying the agent's model or code. While existing evaluations measure task performance in benign settings, they overlook the specific vulnerabilities of the tool selection mechanism under adversarial conditions. To address this gap, we introduce ToolCert, the first statistical framework that formally certifies tool selection robustness. ToolCert models tool selection as a Bernoulli success process and evaluates it against a strong, adaptive attacker who introduces adversarial tools with misleading metadata, and are iteratively refined based on the agent's previous choices. By sampling these adversarial interactions, ToolCert produces a high-confidence lower bound on accuracy, formally quantifying the agent's worst-case performance. Our evaluation with ToolCert uncovers the severe fragility: under attacks injecting deceptive tools or saturating retrieval, the certified accuracy bound drops near zero, an average performance drop of over 60% compared to non-adversarial settings. For attacks targeting the retrieval and selection stages, the certified accuracy bound plummets to less than 20% after just a single round of adversarial adaptation. ToolCert thus reveals previously unexamined security threats inherent to tool selection and provides a principled method to quantify an agent's robustness to such threats, a necessary step for the safe deployment of agentic systems.

Key Contributions

ToolCert: the first statistical framework that formally certifies LLM agent tool-selection robustness by modeling each agent interaction as a Bernoulli trial and computing a high-confidence lower bound on accuracy
Adaptive attacker model using iterative Markov-process refinement that generates adversarial tools conditioned on the agent's prior selection failures
Empirical demonstration of severe fragility in SOTA LLMs under tool injection and retrieval saturation attacks, with certified accuracy collapsing to near zero (>60% average drop)

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

API-BankT-Eval

Applications

llm agent tool selectionagentic ai systemsfunction calling pipelines

Read PDF arXiv DOI

Quantifying Distributional Robustness of Agentic Tool-Selection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

Systematic Analysis of MCP Security

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers

Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning