survey 2025

Chasing Shadows: Pitfalls in LLM Security Research

Jonathan Evertz ^1,2, Niklas Risse ^1,3, Nicolai Neuer ⁴, Andreas Müller ², Philipp Normann ⁵, Gaetano Sapia ⁶, Srishti Gupta , David Pape ¹, Soumya Shaw ¹, Devansh Srivastav ¹, Christian Wressnegger ⁴, Erwin Quiring ⁵, Thorsten Eisenhofer ¹, Daniel Arp ⁵, Lea Schönherr ¹

¹ CISPA Helmholtz Center for Information Security

² Max Planck Institute for Security and Privacy

³ Ruhr University Bochum

⁴ Karlsruhe Institute of Technology

⁵ TU Wien

⁶ Sapienza University of Rome

2 citations · 139 references · arXiv

Published on arXiv

2512.09549

Data Poisoning Attack

OWASP ML Top 10 — ML02

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Every surveyed paper contains at least one methodological pitfall, yet only 15.7% of identified pitfalls are explicitly acknowledged by the authors

Large language models (LLMs) are increasingly prevalent in security research. Their unique characteristics, however, introduce challenges that undermine established paradigms of reproducibility, rigor, and evaluation. Prior work has identified common pitfalls in traditional machine learning research, but these studies predate the advent of LLMs. In this paper, we identify nine common pitfalls that have become (more) relevant with the emergence of LLMs and that can compromise the validity of research involving them. These pitfalls span the entire computation process, from data collection, pre-training, and fine-tuning to prompting and evaluation. We assess the prevalence of these pitfalls across all 72 peer-reviewed papers published at leading Security and Software Engineering venues between 2023 and 2024. We find that every paper contains at least one pitfall, and each pitfall appears in multiple papers. Yet only 15.7% of the present pitfalls were explicitly discussed, suggesting that the majority remain unrecognized. To understand their practical impact, we conduct four empirical case studies showing how individual pitfalls can mislead evaluation, inflate performance, or impair reproducibility. Based on our findings, we offer actionable guidelines to support the community in future work.

Key Contributions

Identifies nine LLM-specific methodological pitfalls spanning the full pipeline: data collection, pre-training, fine-tuning, prompt engineering, and evaluation
Empirically assesses pitfall prevalence across 72 peer-reviewed papers at leading Security and Software Engineering venues (2023–2024), finding every paper contains at least one
Conducts four empirical case studies demonstrating how individual pitfalls (e.g., proxy fallacy, prompt sensitivity) concretely inflate performance or impair reproducibility

🛡️ Threat Analysis

Data Poisoning Attack

Data poisoning (P4) is explicitly one of the nine pitfalls catalogued — researchers in this corpus fail to properly account for or evaluate training data contamination, undermining validity of ML02-related security claims.

Details

Domains

nlp

Model Types

llm

Threat Tags

training_timeinference_time

Applications

llm security research methodology

Read PDF arXiv DOI

Chasing Shadows: Pitfalls in LLM Security Research

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation

MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Security in LLM-as-a-Judge: A Comprehensive SoK

RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation