attack 2026

ToolFlood: Beyond Selection -- Hiding Valid Tools from LLM Agents via Semantic Covering

Hussein Jawad 1, Nicolas J-B Brunel 1,2,3

0 citations

α

Published on arXiv

2603.13950

Input Manipulation Attack

OWASP ML Top 10 — ML01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Model Denial of Service

OWASP LLM Top 10 — LLM04

Key Finding

Achieves 95% attack success rate with only 1% injection rate on ToolBench by semantically covering queries and saturating top-k retrieval

ToolFlood

Novel technique introduced


Large Language Model (LLM) agents increasingly use external tools for complex tasks and rely on embedding-based retrieval to select a small top-k subset for reasoning. As these systems scale, the robustness of this retrieval stage is underexplored, even though prior work has examined attacks on tool selection. This paper introduces ToolFlood, a retrieval-layer attack on tool-augmented LLM agents. Rather than altering which tool is chosen after retrieval, ToolFlood overwhelms retrieval itself by injecting a few attacker-controlled tools whose metadata is carefully placed by exploiting the geometry of embedding space. These tools semantically span many user queries, dominate the top-k results, and push all benign tools out of the agent's context. ToolFlood uses a two-phase adversarial tool generation strategy. It first samples subsets of target queries and uses an LLM to iteratively generate diverse tool names and descriptions. It then runs an iterative greedy selection that chooses tools maximizing coverage of remaining queries in embedding space under a cosine-distance threshold, stopping when all queries are covered or a budget is reached. We provide theoretical analysis of retrieval saturation and show on standard benchmarks that ToolFlood achieves up to a 95% attack success rate with a low injection rate (1% in ToolBench). The code will be made publicly available at the following link: https://github.com/as1-prog/ToolFlood


Key Contributions

  • First retrieval-layer attack that achieves top-k domination by exploiting embedding space geometry to hide all benign tools
  • Two-phase adversarial tool generation: LLM-based diverse metadata generation followed by greedy coverage maximization in embedding space
  • Theoretical analysis of retrieval saturation dynamics and empirical demonstration of 95% attack success with only 1% tool injection rate

🛡️ Threat Analysis

Input Manipulation Attack

ToolFlood manipulates the embedding space geometry to craft adversarial tool metadata that evades retrieval-based filtering and dominates top-k results. This is an inference-time evasion attack exploiting the semantic retrieval mechanism.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeted
Datasets
ToolBench
Applications
llm agent tool retrievaltool-augmented agentsapi discovery systems