Latest papers

4 papers
attack arXiv Mar 10, 2026 · 27d ago

CLIOPATRA: Extracting Private Information from LLM Insights

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz · arXiv · University College London +1 more

Attacks Anthropic's Clio LLM analytics platform by injecting crafted chats to extract private medical history of target users, bypassing layered privacy protections

Sensitive Information Disclosure Prompt Injection nlp
PDF Code
defense arXiv Feb 4, 2026 · 8w ago

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang et al. · Case Western Reserve University · University of Pittsburgh +2 more

Defends LLMs against jailbreaks via OOD detection on safe prompts, reducing false positives by 40x over specialized safety models

Prompt Injection nlp
1 citations PDF
attack arXiv Oct 7, 2025 · Oct 2025

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

Jiashu Tao, Reza Shokri · National University of Singapore · Google Research

Proposes stronger information-theoretic MIA for LLMs, extending to token-level localization of memorized training data

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Sep 29, 2025 · Sep 2025

Incentive-Aligned Multi-Source LLM Summaries

Yanchen Jiang, Zhe Feng, Aranyak Mehta · Harvard University · Google Research

Defends LLM summarization pipelines against indirect prompt injection by scoring sources via peer prediction before synthesis

Prompt Injection nlp
PDF