Security in LLM-as-a-Judge: A Comprehensive SoK
Aiman Almasoud 1, Antony Anju 2,3, Marco Arazzi 1, Mert Cihangiroglu 1, Vignesh Kumar Kembu 1, Serena Nicolazzo 1, Antonino Nocera 1, Vinod P. 2,3, Saraga Sakthidharan 1
Published on arXiv
2603.29403
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Analyzes 863 works and selects 45 relevant studies (2020-2026) revealing significant vulnerabilities in LLM-based evaluation frameworks including position bias and adversarial manipulation risks
LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major academic databases, analyzing 863 works and selecting 45 relevant studies published between 2020 and 2026. Based on this study, we propose a taxonomy that organizes recent research according to the role played by LLM-as-a-Judge in the security landscape, distinguishing between attacks targeting LaaJ systems, attacks performed through LaaJ, defenses leveraging LaaJ for security purposes, and applications where LaaJ is used as an evaluation strategy in security-related domains. We further provide a comparative analysis of existing approaches, highlighting current limitations, emerging threats, and open research challenges. Our findings reveal significant vulnerabilities in LLM-based evaluation frameworks, as well as promising directions for improving their robustness and reliability. Finally, we outline key research opportunities that can guide the development of more secure and trustworthy LLM-as-a-Judge systems.
Key Contributions
- First comprehensive SoK on security aspects of LLM-as-a-Judge systems analyzing 863 works
- Novel taxonomy categorizing LaaJ security research into attacks targeting LaaJ, attacks through LaaJ, defenses leveraging LaaJ, and security evaluation applications
- Comparative analysis identifying vulnerabilities in LLM-based evaluation frameworks and future research directions