attack 2026

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang , Rui Zhang , Yu Liu , Chi Liu , Qingchuan Zhao , Hongwei Li , Guowen Xu

University of Electronic Science and Technology of China

0 citations

Published on arXiv

2604.21829

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Agent skills can be extracted with only 3 interactions across commercial platforms, posing serious copyright risk despite proposed defenses

Automated Skill Stealing Framework

Novel technique introduced

LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free skill marketplaces already report 90368 published skills, while paid marketplaces report more than 2000 listings and over $100,000 in creator earnings. Yet this growing marketplace also creates a new attack surface, as adversaries can interact with public agent to extract hidden proprietary skill content. We present the first empirical study of black-box skill stealing against LLM agent systems. To study this threat, we first derive an attack taxonomy from prior prompt-stealing methods and build an automated stealing prompt generation agent. This agent starts from model-generated seed prompts, expands them through scenario rationalization and structure injection, and enforces diversity via embedding filtering. This process yields a reproducible pipeline for evaluating agent systems. We evaluate such attacks across 3 commercial agent architectures and 5 LLMs. Our results show that agent skills can be extracted with only 3 interactions, posing a serious copyright risk. To mitigate this threat, we design defenses across three stages of the agent pipeline: input, inference, and output. Although these defenses achieve strong results, the attack remains inexpensive and readily automatable, allowing an adversary to launch repeated attempts with different variants; only one successful attempt is sufficient to compromise the protected skill. Overall, our findings suggest that these copyright risks are largely overlooked across proprietary agent ecosystems. We therefore advocate for more robust defense strategies that provide stronger protection guarantees.

Key Contributions

First empirical study of black-box skill stealing attacks against LLM agent systems
Automated attack framework using seed prompt generation, scenario rationalization, structure injection, and embedding-based diversity filtering
Evaluation across 3 commercial agent architectures, 5 LLMs, showing skill extraction in ~3 interactions
Defense mechanisms across input/inference/output pipeline stages, with analysis showing attacks remain practical despite defenses

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Applications

llm agentscommercial ai assistantsagent skill marketplaces

Read PDF arXiv

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

CLIOPATRA: Extracting Private Information from LLM Insights

Tricking LLM-Based NPCs into Spilling Secrets

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Bypassing Prompt Guards in Production with Controlled-Release Prompting

EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System