attack 2025

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Yuchong Xie ¹, Mingyu Luo ^1,2, Zesen Liu ¹, Zhixiang Zhang ¹, Kaikai Zhang ¹, Yu Liu ^1,2, Zongjie Li ¹, Ping Chen ², Shuai Wang ¹, Dongdong She ¹

¹ The Hong Kong University of Science and Technology

² Fudan University

0 citations

Published on arXiv

2509.05755

Prompt Injection

OWASP LLM Top 10 — LLM01

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Achieves remote code execution on all 25 tested agent-LLM pairs and system prompt leakage on every agent using Claude or Grok backends across six real-world coding agents

TIP (Two-Channel Prompt Injection) + ToolLeak

Novel technique introduced

Coding agents powered by large language models are becoming central modules of modern IDEs, helping users perform complex tasks by invoking tools. While powerful, tool invocation opens a substantial attack surface. Prior work has demonstrated attacks against general-purpose and domain-specific agents, but none have focused on the security risks of tool invocation in coding agents. To fill this gap, we conduct the first systematic red-teaming of six popular real-world coding agents: Cursor, Claude Code, Copilot, Windsurf, Cline, and Trae. Our red-teaming proceeds in two phases. In Phase 1, we perform prompt leakage reconnaissance to recover system prompts. We discover a general vulnerability, ToolLeak, which allows malicious prompt exfiltration through benign argument retrieval during tool invocation. In Phase 2, we hijack the agent's tool-invocation behavior using a novel two-channel prompt injection in the tool description and return values, achieving remote code execution (RCE). We adaptively construct payloads using security information leaked in Phase 1. In emulation across five backends, our method outperforms baselines on Claude-Sonnet-4, Claude-Sonnet-4.5, Grok-4, and GPT-5. On real agents, our approach succeeds on 19 of 25 agent-LLM pairs, achieving leakage on every agent using Claude and Grok backends. For tool-invocation hijacking, we obtain RCE on every tested agent-LLM pair, with our two-channel method delivering the highest success rate. We provide case studies on Cursor and Claude Code, analyze security guardrails of external and built-in tools, and conclude with practical defense recommendations.

Key Contributions

ToolLeak: a general vulnerability enabling system prompt exfiltration through benign tool argument retrieval, succeeding on every agent using Claude and Grok backends
Two-channel prompt injection (TIP) that injects malicious payloads via both tool descriptions and return values, achieving RCE on all 25 tested agent-LLM pairs
First systematic security evaluation of six real-world coding agents (Cursor, Claude Code, Copilot, Windsurf, Cline, Trae) with adaptive payload construction using leaked system prompt information

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Applications

coding agentsai-powered idesllm tool-use systems

Read PDF arXiv Code

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem

An Empirical Study on the Security Vulnerabilities of GPTs

IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Bypassing Prompt Guards in Production with Controlled-Release Prompting

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools