The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

Removing personally identifiable information (PII) from texts is necessary to comply with various data protection regulations and to enable data sharing without compromising privacy. However, recent works show that documents sanitized by PII removal techniques are vulnerable to reconstruction attacks. Yet, we suspect that the reported success of these attacks is largely overestimated. We critically analyze the evaluation of existing attacks and find that data leakage and data contamination are not properly mitigated, leaving the question whether or not PII removal techniques truly protect privacy in real-world scenarios unaddressed. We investigate possible data sources and attack setups that avoid data leakage and conclude that only truly private data can allow us to objectively evaluate vulnerabilities in PII removal techniques. However, access to private data is heavily restricted - and for good reasons - which also means that the public research community cannot address this problem in a transparent, reproducible, and trustworthy manner.

Key Contributions

Identifies fundamental methodological flaws in existing PII reconstruction attack evaluations, specifically data leakage and data contamination from LLM pretraining memorization
Argues that only truly private data (unavailable to LLM pretraining) can enable objective evaluation of PII removal vulnerabilities
Presents experiments on low-contamination data (Czech court announcements and YouTube vlogs) to validate the critique

🛡️ Threat Analysis

Model Inversion Attack

The paper's central topic is PII reconstruction attacks — adversaries using LLMs to reverse-engineer private information from sanitized documents. The paper specifically identifies LLM memorization of original private training data as a key confound inflating attack success, which is a form of training data extraction/reconstruction attack.

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

Czech court announcementsYouTube vlogs transcripts

Applications

2025 1 cit.

Model Inversion Attack

86%