Research Papers research paper arxiv ai artificial-intelligence

Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification

arXivby [Submitted on 30 Mar 2026]March 31, 20262 min read1 views

arXiv:2603.28488v1 Announce Type: cross Abstract: Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured, adversarial deliberation. Our approach integrates specialized roles (e.g., Plaintiff, Defense, Judge) with Progressive RAG (P-RAG) to dynamically expand a — Masnun Nuha Chowdhury, Nusrat Jahan Beg, Umme Hunny Khan, Syed Rifat Raiyan, Md Kamrul Hasan, Hasan Mahmud

View PDF HTML (experimental)

Abstract:Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured, adversarial deliberation. Our approach integrates specialized roles (e.g., Plaintiff, Defense, Judge) with Progressive RAG (P-RAG) to dynamically expand and refine the evidence pool during the debate. Furthermore, we employ evidence negotiation, self-reflection, and heterogeneous multi-judge aggregation to enforce calibration, robustness, and diversity. In zero-shot evaluations on the Check-COVID benchmark, PROClaim achieves 81.7% accuracy, outperforming standard multi-agent debate by 10.0 percentage points, with P-RAG driving the primary performance gains (+7.5 pp). We ultimately demonstrate that structural deliberation and model heterogeneity effectively mitigate systematic biases, providing a robust foundation for reliable claim verification. Our code and data are publicly available at this https URL.

Comments: Under review, 7 figures, 13 tables

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Cite as: arXiv:2603.28488 [cs.CL]

(or arXiv:2603.28488v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.28488

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Syed Rifat Raiyan [view email] [v1] Mon, 30 Mar 2026 14:23:15 UTC (1,128 KB)

Original source

arXiv

https://arxiv.org/abs/2603.28488

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersLive

First time NeurIPS. How different is it from low-ranked conferences? [D]

I'm a PhD student and already published papers in A/B ranked paper (10+). My field of work never allowed me to work on something really exciting and a core A* conference. But finally after years I think I have work worthy of some discussion at the top venue. I'm referring to papers (my field and top papers) from previous editions and I notice that there's a big difference on how people write, how they put their message on table and also it is too theoretical sometimes. Are there any golden rules people follow who frequently get into these conferences? Should I be soft while making novelty claims? Also those who moved from submitting to niche-conferences to NeurIPS/ICML/CVPR, did you change your approach? My field is imaging in healthcare. submitted by /u/ade17_in [link] [comments]

Reddit r/MachineLearning

1mabout 1 hour ago

Frontier ResearchLive

AI can describe human experiences but lacks experience in an actual ‘body.’ UCLA researchers say understanding this ‘body gap’ may matter for safety - UCLA Health

AI can describe human experiences but lacks experience in an actual ‘body.’ UCLA researchers say understanding this ‘body gap’ may matter for safety UCLA Health