attack 2025

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

André V. Duarte 1,2, Xuying li 3, Bin Zeng , Arlindo L. Oliveira 2, Lei Li 1, Zhuo Li 3

0 citations · 41 references · arXiv

α

Published on arXiv

2510.25941

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

RECAP improves GPT-4.1 ROUGE-L for copyrighted text extraction from 0.38 to 0.47 (a ~24% increase) over single-iteration baselines across 30+ books

RECAP

Novel technique introduced


If we cannot inspect the training data of a large language model (LLM), how can we ever know what it has seen? We believe the most compelling evidence arises when the model itself freely reproduces the target content. As such, we propose RECAP, an agentic pipeline designed to elicit and verify memorized training data from LLM outputs. At the heart of RECAP is a feedback-driven loop, where an initial extraction attempt is evaluated by a secondary language model, which compares the output against a reference passage and identifies discrepancies. These are then translated into minimal correction hints, which are fed back into the target model to guide subsequent generations. In addition, to address alignment-induced refusals, RECAP includes a jailbreaking module that detects and overcomes such barriers. We evaluate RECAP on EchoTrace, a new benchmark spanning over 30 full books, and the results show that RECAP leads to substantial gains over single-iteration approaches. For instance, with GPT-4.1, the average ROUGE-L score for the copyrighted text extraction improved from 0.38 to 0.47 - a nearly 24% increase.


Key Contributions

  • RECAP: a feedback-driven agentic pipeline that iteratively extracts memorized training data from LLMs by using a secondary model to identify discrepancies and generate minimal correction hints
  • A jailbreaking module integrated into the pipeline to overcome alignment-induced refusals when extracting copyrighted content
  • EchoTrace: a new benchmark of 30+ full copyrighted books for evaluating training data extraction from LLMs

🛡️ Threat Analysis

Model Inversion Attack

The paper's primary contribution is a method to reconstruct training data (copyrighted books) from LLM outputs — exactly 'LLM memorization extraction (adversary extracts training data verbatim from an LLM)' as described in ML03.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Datasets
EchoTrace
Applications
llm training data extractioncopyright memorization detection