ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Amr Gomaa 1, Ahmed Salem 2, Sahar Abdelnabi 2,3,4,5
Published on arXiv
2511.05359
Prompt Injection
OWASP LLM Top 10 — LLM01
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
Privacy attacks succeed in up to 88% of cases and security attacks in up to 60% across 7 SOTA LLMs, with higher-capability models showing greater information leakage despite better task completion.
ConVerse
Novel technique introduced
As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective collaboration requires information sharing, yet every exchange creates new attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating privacy and security risks in agent-agent interactions. ConVerse spans three practical domains (travel, real estate, insurance) with 12 user personas and over 864 contextually grounded attacks (611 privacy, 253 security). Unlike prior single-agent settings, it models autonomous, multi-turn agent-to-agent conversations where malicious requests are embedded within plausible discourse. Privacy is tested through a three-tier taxonomy assessing abstraction quality, while security attacks target tool use and preference manipulation. Evaluating seven state-of-the-art models reveals persistent vulnerabilities; privacy attacks succeed in up to 88% of cases and security breaches in up to 60%, with stronger models leaking more. By unifying privacy and security within interactive multi-agent contexts, ConVerse reframes safety as an emergent property of communication.
Key Contributions
- ConVerse benchmark with 864 contextually grounded attacks (611 privacy, 253 security) across 3 domains and 12 user personas for multi-agent LLM evaluation
- Three-tier privacy taxonomy assessing abstraction quality (unrelated, related-but-private, related-and-useful) rather than binary filtering
- Empirical evaluation of 7 SOTA LLMs revealing privacy attack success rates of 37–88% and security breach rates of 2–60%, with stronger models leaking more