defense 2025

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Bochuan Cao ¹, Changjiang Li ², Yuanpu Cao ¹, Yameng Ge ¹, Ting Wang ³, Jinghui Chen ¹

¹ The Pennsylvania State University

² Palo Alto Networks

³ Stony Brook University

5 citations · 1 influential · 47 references · CCS

Published on arXiv

2509.21884

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

SysVec effectively prevents system prompt extraction attacks while preserving functional integrity and improving instruction-following, validated against GPT-4o and Claude 3.5 Sonnet baselines.

SysVec

Novel technique introduced

Large language models (LLMs) have been widely adopted across various applications, leveraging customized system prompts for diverse tasks. Facing potential system prompt leakage risks, model developers have implemented strategies to prevent leakage, primarily by disabling LLMs from repeating their context when encountering known attack patterns. However, it remains vulnerable to new and unforeseen prompt-leaking techniques. In this paper, we first introduce a simple yet effective prompt leaking attack to reveal such risks. Our attack is capable of extracting system prompts from various LLM-based application, even from SOTA LLM models such as GPT-4o or Claude 3.5 Sonnet. Our findings further inspire us to search for a fundamental solution to the problems by having no system prompt in the context. To this end, we propose SysVec, a novel method that encodes system prompts as internal representation vectors rather than raw text. By doing so, SysVec minimizes the risk of unauthorized disclosure while preserving the LLM's core language capabilities. Remarkably, this approach not only enhances security but also improves the model's general instruction-following abilities. Experimental results demonstrate that SysVec effectively mitigates prompt leakage attacks, preserves the LLM's functional integrity, and helps alleviate the forgetting issue in long-context scenarios.

Key Contributions

A simple yet effective prompt leaking attack capable of extracting system prompts from state-of-the-art LLMs including GPT-4o and Claude 3.5 Sonnet
SysVec: encodes system prompts as internal representation vectors rather than raw text in context, fundamentally eliminating the leakage surface
Empirical evidence that SysVec preserves LLM instruction-following ability and alleviates long-context forgetting as a secondary benefit

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

llm-based applicationschatbotsllm apis with custom system prompts

Read PDF arXiv DOI

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private

Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs

Differentially Private Retrieval-Augmented Generation