Security in LLM-as-a-Judge: A Comprehensive SoK
arXiv:2603.29403v1 Announce Type: new Abstract: LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major aca
View PDF HTML (experimental)
Abstract:LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major academic databases, analyzing 863 works and selecting 45 relevant studies published between 2020 and 2026. Based on this study, we propose a taxonomy that organizes recent research according to the role played by LLM-as-a-Judge in the security landscape, distinguishing between attacks targeting LaaJ systems, attacks performed through LaaJ, defenses leveraging LaaJ for security purposes, and applications where LaaJ is used as an evaluation strategy in security-related domains. We further provide a comparative analysis of existing approaches, highlighting current limitations, emerging threats, and open research challenges. Our findings reveal significant vulnerabilities in LLM-based evaluation frameworks, as well as promising directions for improving their robustness and reliability. Finally, we outline key research opportunities that can guide the development of more secure and trustworthy LLM-as-a-Judge systems.
Subjects:
Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.29403 [cs.CR]
(or arXiv:2603.29403v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2603.29403
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Serena Nicolazzo Dr [view email] [v1] Tue, 31 Mar 2026 08:05:54 UTC (104 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelannounceWaterNSW Adopts Generative AI For Applications - Let's Data Science
<a href="https://news.google.com/rss/articles/CBMikgFBVV95cUxNVkdHejJmSW85UzVuSEJnRUpBbHpORklieU1QeGU2cS1qZjRhdGZKZG55VHRQN2p4Sk9QNk12LTE5aXpaaEFxc0Nvb0xpcjJkRXlFeDc4T3hLQkdBOEltNFUzeWRZWkVjZ2RxR0FlQS1qZWg4cFRRZmVhbXBPbXFIUHZ5bzVCSUFXMEVCclFjR2RSUQ?oc=5" target="_blank">WaterNSW Adopts Generative AI For Applications</a> <font color="#6f6f6f">Let's Data Science</font>
The Fallback That Never Fires
<p>Your agent hits a rate limit. The fallback logic kicks in, picks an alternative model. Everything should be fine.</p> <p>Except the request still goes to the original model. And gets rate-limited again. And again. Forever.</p> <h2> The Setup </h2> <p>When your primary model returns 429:</p> <ol> <li>Fallback logic detects rate_limit_error</li> <li>Selects next model in the fallback chain</li> <li>Retries with the fallback model</li> <li>User never notices</li> </ol> <p>OpenClaw has had model fallback chains for months, and they generally work well.</p> <h2> The Override </h2> <p><a href="https://github.com/openclaw/openclaw/issues/59213" rel="noopener noreferrer">Issue #59213</a> exposes a subtle timing problem. Between steps 2 and 3, there is another system: <strong>session model recon
Promoting late-gameplay BG3 composition contracts in the TD2 SDL port
<h1> Promoting late-gameplay BG3 composition contracts in the TD2 SDL port </h1> <p>This checkpoint moved one late-gameplay renderer hypothesis out of tooling and into the runtime.</p> <p>The late live-entry bundles at frames <code>3250</code>, <code>3400</code>, and <code>3550</code> already had a strong signal from the cutoff sweep: the missing horizon strip was not explained by missing assets, but by a narrow composition rule. The best candidates were consistent enough to promote:</p> <ul> <li>frame <code>3250</code>: enable <code>BG3</code> in the top <code>79</code> scanlines and keep <code>BG3 > BG2</code> there</li> <li>frame <code>3400</code>: same <code>79</code>-line window</li> <li>frame <code>3550</code>: same rule with a deeper <code>95</code>-line window</li> </ul> <p>The run
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
AI maps science papers to predict research trends two to three years ahead - Tech Xplore
<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTE5aTkZYTWdaRDZwTXNRMldpMG1WZ1YzWDZTOHN5M183Z3A1ZTFYbnhEWTdPRmpvZnZFU0xodlRsNWxFaGxTcEpwalhJNmJpQWE5VjhaRS1tOXJIeTc5Z0JNblJ3dFd4WjRYZGJOX0NrWGt6ZmZJVTBpRm5wWQ?oc=5" target="_blank">AI maps science papers to predict research trends two to three years ahead</a> <font color="#6f6f6f">Tech Xplore</font>
AI inspires new research topics in materials science - Nanowerk
<a href="https://news.google.com/rss/articles/CBMiZ0FVX3lxTFBPWlJSM2ExeVQ3LVppTm45NHpEMW9YVkxscThCNDd2OVB0c3J1ZmVCbWNSZWZ0TjZwSzlOdEFXN2UtRk5LU1hxdXd4ZklldGxoM0FZSnhCd19PWkNHQ1ZRVDNwSHNUSk0?oc=5" target="_blank">AI inspires new research topics in materials science</a> <font color="#6f6f6f">Nanowerk</font>

AI maps science papers to predict research trends two to three years ahead
The number of scientific papers is growing so rapidly that scientists are no longer able to keep track of all of them, even in their own research area. Researchers from the Karlsruhe Institute of Technology (KIT), in collaboration with scientific partners, have shown how new research ideas can still be obtained from this wealth of information. Using artificial intelligence (AI), they systematically analyzed materials science publications to identify potential new avenues of research. Their results have been published in Nature Machine Intelligence.
Oracle Cuts 30,000 Jobs to Fund Its AI Gamble - CX Today
<a href="https://news.google.com/rss/articles/CBMikwFBVV95cUxQTTFVNGlKYVNVVThtbUowS01MSTIzemNzV2Y4NWMtd0ItNXhxeXVtUENILXdIVHVSSnZodkFqRkdxdkhqaFo3X3VQbmdSNkdBLWlyeS1xOU01blFLa01UZ0hQMlkza1dpMVRKQk5xVmM5dUFHcURMblN6b05HTjZlZjlXeWlLZ1ROdFh3eTl6WlA1Y00?oc=5" target="_blank">Oracle Cuts 30,000 Jobs to Fund Its AI Gamble</a> <font color="#6f6f6f">CX Today</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!