survey 2026

Security in LLM-as-a-Judge: A Comprehensive SoK

Aiman Almasoud ¹, Antony Anju ^2,3, Marco Arazzi ¹, Mert Cihangiroglu ¹, Vignesh Kumar Kembu ¹, Serena Nicolazzo ¹, Antonino Nocera ¹, Vinod P. ^2,3, Saraga Sakthidharan ¹

¹ arXiv

² University of Pavia

³ Cochin University of Science and Technology

0 citations

Published on arXiv

2603.29403

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Analyzes 863 works and selects 45 relevant studies (2020-2026) revealing significant vulnerabilities in LLM-based evaluation frameworks including position bias and adversarial manipulation risks

LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major academic databases, analyzing 863 works and selecting 45 relevant studies published between 2020 and 2026. Based on this study, we propose a taxonomy that organizes recent research according to the role played by LLM-as-a-Judge in the security landscape, distinguishing between attacks targeting LaaJ systems, attacks performed through LaaJ, defenses leveraging LaaJ for security purposes, and applications where LaaJ is used as an evaluation strategy in security-related domains. We further provide a comparative analysis of existing approaches, highlighting current limitations, emerging threats, and open research challenges. Our findings reveal significant vulnerabilities in LLM-based evaluation frameworks, as well as promising directions for improving their robustness and reliability. Finally, we outline key research opportunities that can guide the development of more secure and trustworthy LLM-as-a-Judge systems.

Key Contributions

First comprehensive SoK on security aspects of LLM-as-a-Judge systems analyzing 863 works
Novel taxonomy categorizing LaaJ security research into attacks targeting LaaJ, attacks through LaaJ, defenses leveraging LaaJ, and security evaluation applications
Comparative analysis identifying vulnerabilities in LLM-based evaluation frameworks and future research directions

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Applications

llm evaluationautomated assessmentbenchmarkingai safety

Read PDF arXiv

Security in LLM-as-a-Judge: A Comprehensive SoK

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs

Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform

Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability

Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework

When Scanners Lie: Evaluator Instability in LLM Red-Teaming

CourtGuard: A Local, Multiagent Prompt Injection Classifier