attack 2026

Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing

0 citations

Published on arXiv

2604.23711

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Achieves >90% attack success rate on GPT-5.4 for extracting PII from agent memory with single query, reducing detection positive rate by 88% compared to existing methods

Spore

Novel technique introduced

With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM) agents has become a critical issue. Existing privacy attacks against LLMs primarily target training data, while research on inference-time contextual privacy risks in LLM agent memory remains limited. Moreover, prior methods often incur high attack costs, requiring multiple queries or relying on white-box assumptions, which limits their practicality in real-world deployments. To address these issues, we propose a training-free privacy extraction attack targeting LLM agent memory, which we name \textsc{Spore}. \textsc{Spore} is compatible with both black-box and gray-box settings. In the black-box setting, \textsc{Spore} can efficiently extract a small candidate set via a single query to recover the original private information. In the gray-box setting, \textsc{Spore} allows the attacker to leverage multi-ranked tokens for more accurate and faster privacy extraction. We provide an information-theoretic analysis of \textsc{Spore} and show that it achieves high query efficiency with substantial per query information leakage. Experiments on multiple frontier LLMs show that \textsc{Spore} outperforms attack success rate over existing state-of-the-art (SOTA) schemes. It also maintains low attack cost and remains stable across different model parameter settings. We further evaluate the robustness of \textsc{Spore} against existing defense mechanisms. Our results show that \textsc{Spore} consistently bypasses both detection and strong safety alignment, demonstrating resilient performance in diverse defensive settings and real-world safety threats.

Key Contributions

Training-free single-query privacy extraction attack compatible with both black-box and gray-box settings
Information-theoretic analysis showing high query efficiency with substantial per-query information leakage
Demonstrates >90% attack success rate on GPT-5.4 while bypassing detection and safety alignment defenses

🛡️ Threat Analysis

Model Inversion Attack

Primary contribution is extracting private user data (PII) from LLM agent contextual memory during inference — this is a data reconstruction/extraction attack against information stored in the model's context.

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxgrey_boxinference_time

Applications

llm agentspersonal ai assistantsconversational ai

Read PDF arXiv

Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

Expert Selections In MoE Models Reveal (Almost) As Much As Text

Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings

REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop

Extracting alignment data in open models

Extracting Training Dialogue Data from Large Language Model based Task Bots

Prior Aware Memorization: An Efficient Metric for Distinguishing Memorization from Generalization in Large Language Models