attack 2026

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

0 citations · 21 references · arXiv (Cornell University)

Published on arXiv

2602.14211

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Consistently achieves high attack success rates across diverse coding-agent settings and real-world software engineering tasks under realistic conditions

SkillJect

Novel technique introduced

Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented behaviors. This abstraction introduces an under-measured attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies. In practice, naive injections often fail because the malicious intent is too explicit or drifts too far from the original skill, leading agents to ignore or refuse them; existing attacks are also largely hand-crafted. We propose the first automated framework for stealthy prompt injection tailored to agent skills. The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills in a realistic tool environment, and an Evaluate Agent that logs action traces (e.g., tool calls and file operations) and verifies whether targeted malicious behaviors occurred. We also propose a malicious payload hiding strategy that conceals adversarial operations in auxiliary scripts while injecting optimized inducement prompts to trigger tool execution. Extensive experiments across diverse coding-agent settings and real-world software engineering tasks show that our method consistently achieves high attack success rates under realistic settings.

Key Contributions

First automated framework (SkillJect) for generating stealthy skill-based prompt injections targeting coding agents, using a closed loop of Attack Agent, Code Agent, and Evaluate Agent
Malicious payload hiding strategy that conceals adversarial operations in auxiliary scripts while injecting optimized inducement prompts to trigger tool execution
Trace-driven closed-loop refinement using action logs (tool calls, file operations) to iteratively improve injection stealth and attack success rates

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeteddigital

Applications

coding agentssoftware engineering automation

Read PDF arXiv DOI

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

When Skills Lie: Hidden-Comment Injection in LLM Agents

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE

AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search