benchmark 2025

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation

Yuxuan Qiao ^1,2, Dongqin Liu ^1,2, Hongchang Yang ^1,2, Wei Zhou ^1,2, Songlin Hu ^1,2

¹ Chinese Academy of Sciences

² University of Chinese Academy of Sciences

0 citations · 82 references · arXiv

Published on arXiv

2512.16310

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Across 8 representative LLMs, the average Risk Leakage Rate in multi-tool orchestration scenarios is 90.24%, with no model exceeding an H-Score of 0.3; PEP reduces RLR to 46.58% and improves H-Score to 0.624.

Privacy Enhancement Principle (PEP)

Novel technique introduced

Driven by Large Language Models, the single-agent, multi-tool architecture has become a popular paradigm for autonomous agents due to its simplicity and effectiveness. However, this architecture also introduces a new and severe privacy risk, which we term Tools Orchestration Privacy Risk (TOP-R), where an agent, to achieve a benign user goal, autonomously aggregates information fragments across multiple tools and leverages its reasoning capabilities to synthesize unexpected sensitive information. We provide the first systematic study of this risk. First, we establish a formal framework, attributing the risk's root cause to the agent's misaligned objective function: an overoptimization for helpfulness while neglecting privacy awareness. Second, we construct TOP-Bench, comprising paired leakage and benign scenarios, to comprehensively evaluate this risk. To quantify the trade-off between safety and robustness, we introduce the H-Score as a holistic metric. The evaluation results reveal that TOP-R is a severe risk: the average Risk Leakage Rate (RLR) of eight representative models reaches 90.24%, while the average H-Score is merely 0.167, with no model exceeding 0.3. Finally, we propose the Privacy Enhancement Principle (PEP) method, which effectively mitigates TOP-R, reducing the Risk Leakage Rate to 46.58% and significantly improving the H-Score to 0.624. Our work reveals both a new class of risk and inherent structural limitations in current agent architectures, while also offering feasible mitigation strategies.

Key Contributions

Formal framework characterizing Tools Orchestration Privacy Risk (TOP-R) as stemming from a misaligned objective function in single-agent, multi-tool LLM architectures
TOP-Bench: a paired dataset of leakage/benign scenarios with H-Score metric to quantify the safety–robustness trade-off; 8 state-of-the-art models achieve an average RLR of 90.24% and H-Score of only 0.167
Privacy Enhancement Principle (PEP) mitigation method that reduces the Risk Leakage Rate from 90.24% to 46.58% and raises the average H-Score to 0.624

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Datasets

TOP-Bench

Applications

llm agentsmulti-tool ai assistantsautonomous agents

Read PDF arXiv DOI

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems

The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability

Beyond Data Privacy: New Privacy Risks for Large Language Models

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration

Language Models Identify Ambiguities and Exploit Loopholes

Agentic Misalignment: How LLMs Could Be Insider Threats

NEST: Nascent Encoded Steganographic Thoughts

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning