survey 2025

A Survey on Data Security in Large Language Models

0 citations

Published on arXiv

2508.02312

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Training Data Poisoning

OWASP LLM Top 10 — LLM03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Survey identifies data poisoning, prompt injection, and PII leakage as the three dominant data-security threat categories for LLMs and maps current defenses to each threat class.

Large Language Models (LLMs), now a foundation in advancing natural language processing, power applications such as text generation, machine translation, and conversational systems. Despite their transformative potential, these models inherently rely on massive amounts of training data, often collected from diverse and uncurated sources, which exposes them to serious data security risks. Harmful or malicious data can compromise model behavior, leading to issues such as toxic output, hallucinations, and vulnerabilities to threats such as prompt injection or data poisoning. As LLMs continue to be integrated into critical real-world systems, understanding and addressing these data-centric security risks is imperative to safeguard user trust and system reliability. This survey offers a comprehensive overview of the main data security risks facing LLMs and reviews current defense strategies, including adversarial training, RLHF, and data augmentation. Additionally, we categorize and analyze relevant datasets used for assessing robustness and security across different domains, providing guidance for future research. Finally, we highlight key research directions that focus on secure model updates, explainability-driven defenses, and effective governance frameworks, aiming to promote the safe and responsible development of LLM technology. This work aims to inform researchers, practitioners, and policymakers, driving progress toward data security in LLMs.

Key Contributions

Comprehensive taxonomy of data-centric security risks in LLMs covering poisoning, prompt injection, hallucinations, and PII leakage
Review and categorization of defense strategies including adversarial training, RLHF, and data augmentation across the LLM lifecycle
Catalog of datasets used for robustness and security evaluation, plus identification of future research directions around explainability-driven defenses and governance frameworks

🛡️ Threat Analysis

Data Poisoning Attack

Data poisoning is presented as a core threat in this 'data security' survey, covering injection of malicious samples into LLM training corpora and defenses such as data sanitization and adversarial training.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timeinference_time

Applications

text generationmachine translationconversational systemsquestion answering

Read PDF arXiv

A Survey on Data Security in Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs

Chasing Shadows: Pitfalls in LLM Security Research

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

Provably Secure Retrieval-Augmented Generation

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs