benchmark 2026

The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

Sebastian Ochs 1,2, Ivan Habernal 1,3,4

0 citations

α

Published on arXiv

2603.08207

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Reported success rates of LLM-based PII reconstruction attacks are likely overestimated because evaluations fail to control for LLM pretraining memorization and data leakage from publicly available source corpora.


Removing personally identifiable information (PII) from texts is necessary to comply with various data protection regulations and to enable data sharing without compromising privacy. However, recent works show that documents sanitized by PII removal techniques are vulnerable to reconstruction attacks. Yet, we suspect that the reported success of these attacks is largely overestimated. We critically analyze the evaluation of existing attacks and find that data leakage and data contamination are not properly mitigated, leaving the question whether or not PII removal techniques truly protect privacy in real-world scenarios unaddressed. We investigate possible data sources and attack setups that avoid data leakage and conclude that only truly private data can allow us to objectively evaluate vulnerabilities in PII removal techniques. However, access to private data is heavily restricted - and for good reasons - which also means that the public research community cannot address this problem in a transparent, reproducible, and trustworthy manner.


Key Contributions

  • Identifies fundamental methodological flaws in existing PII reconstruction attack evaluations, specifically data leakage and data contamination from LLM pretraining memorization
  • Argues that only truly private data (unavailable to LLM pretraining) can enable objective evaluation of PII removal vulnerabilities
  • Presents experiments on low-contamination data (Czech court announcements and YouTube vlogs) to validate the critique

🛡️ Threat Analysis

Model Inversion Attack

The paper's central topic is PII reconstruction attacks — adversaries using LLMs to reverse-engineer private information from sanitized documents. The paper specifically identifies LLM memorization of original private training data as a key confound inflating attack success, which is a form of training data extraction/reconstruction attack.


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Datasets
Czech court announcementsYouTube vlogs transcripts
Applications
text anonymizationpii removaldocument de-identification