ML Security Papers

LLM06

Sensitive Information Disclosure

LLMs leaking training data, PII, prompts

233 papers Browse all papers

Monthly publications

Paper types

defense 91

attack 69

benchmark 56

survey 12

tool 5

Domains

nlp 229

multimodal 19

vision 13

federated-learning 7

generative 5

graph 4

audio 2

tabular 1

Co-occurring categories

Other OWASP categories that appear on the same papers

ML03 Model Inversion Attack

LLM01 Prompt Injection

ML04 Membership Inference Attack

LLM07 Insecure Plugin Design

LLM08 Excessive Agency

ML02 Data Poisoning Attack

ML05 Model Theft

ML10 Model Poisoning

LLM03 Training Data Poisoning

ML06 AI Supply Chain Attacks

ML01 Input Manipulation Attack

ML09 Output Integrity Attack

LS06 Red-Team Agents

LS03 Reconnaissance & OSINT

ML07 Transfer Learning Attack

Top cited papers

Language Models are Injective and Hence Invertible

Eliciting Secret Knowledge from Language Models

Mitigating the OWASP Top 10 For Large Language Models Applications using Intelligent Agents

Hubble: a Model Suite to Advance the Study of LLM Memorization

Extracting books from production language models

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought

Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs

Extracting alignment data in open models

Browse all 233 papers