ML Security Papers

Latest papers

4 papers

attack arXiv Mar 10, 2026 · 27d ago

CLIOPATRA: Extracting Private Information from LLM Insights

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz · arXiv · University College London +1 more

Attacks Anthropic's Clio LLM analytics platform by injecting crafted chats to extract private medical history of target users, bypassing layered privacy protections

Sensitive Information Disclosure Prompt Injection nlp

PDF Code

defense arXiv Feb 4, 2026 · 8w ago

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang et al. · Case Western Reserve University · University of Pittsburgh +2 more

Defends LLMs against jailbreaks via OOD detection on safe prompts, reducing false positives by 40x over specialized safety models

Prompt Injection nlp

1 citations PDF

attack arXiv Oct 7, 2025 · Oct 2025

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

Jiashu Tao, Reza Shokri · National University of Singapore · Google Research

Proposes stronger information-theoretic MIA for LLMs, extending to token-level localization of memorized training data

Membership Inference Attack Sensitive Information Disclosure nlp

PDF

defense arXiv Sep 29, 2025 · Sep 2025

Incentive-Aligned Multi-Source LLM Summaries

Yanchen Jiang, Zhe Feng, Aranyak Mehta · Harvard University · Google Research

Defends LLM summarization pipelines against indirect prompt injection by scoring sources via peer prediction before synthesis

Prompt Injection nlp

PDF

Latest papers

CLIOPATRA: Extracting Private Information from LLM Insights

Trust The Typical

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

Incentive-Aligned Multi-Source LLM Summaries

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue