Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessCentOS Launches Accelerated Infrastructure Enablement For Driving NVIDIA AI Factories - PhoronixGNews AI NVIDIAAI #162: Visions of MythosLessWrong AIThe Fundrise Innovation Fund (VCX) Participates in OpenAI’s $122 Billion Funding Round - citybizGoogle News: OpenAIIBM, Arm team up to bring Arm software to IBM Z mainframesCIO MagazineAI project ‘failure’ has little to do with AI - ComputerworldGoogle News: Generative AIAnaxi Labs Partners with Carnegie Mellon to Tackle AI's Biggest Problem: Economics - Lexington Herald LeaderGoogle News: Generative AIOpenAI’s record $122 billion round is just the start - The Business JournalsGoogle News: OpenAIPrediction: Nvidia Will Do the Unthinkable and Hit $100 Before the End of 2026 - The Motley FoolGNews AI NVIDIAAmii Launches Technical Track for Software Pros as Part of ‘AI Pathways’ Program - Calgary.TechGoogle News: Machine LearningI wrote a novel using AI. Writers must accept artificial intelligence – but we are as valuable as ever - The GuardianGoogle News: AIWill AI make it harder for non-graduates to climb the jobs ladder?Financial Times TechColumn: For the Children – Artificial Intelligence brings new risks for our children - Duncan BannerGoogle News: AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessCentOS Launches Accelerated Infrastructure Enablement For Driving NVIDIA AI Factories - PhoronixGNews AI NVIDIAAI #162: Visions of MythosLessWrong AIThe Fundrise Innovation Fund (VCX) Participates in OpenAI’s $122 Billion Funding Round - citybizGoogle News: OpenAIIBM, Arm team up to bring Arm software to IBM Z mainframesCIO MagazineAI project ‘failure’ has little to do with AI - ComputerworldGoogle News: Generative AIAnaxi Labs Partners with Carnegie Mellon to Tackle AI's Biggest Problem: Economics - Lexington Herald LeaderGoogle News: Generative AIOpenAI’s record $122 billion round is just the start - The Business JournalsGoogle News: OpenAIPrediction: Nvidia Will Do the Unthinkable and Hit $100 Before the End of 2026 - The Motley FoolGNews AI NVIDIAAmii Launches Technical Track for Software Pros as Part of ‘AI Pathways’ Program - Calgary.TechGoogle News: Machine LearningI wrote a novel using AI. Writers must accept artificial intelligence – but we are as valuable as ever - The GuardianGoogle News: AIWill AI make it harder for non-graduates to climb the jobs ladder?Financial Times TechColumn: For the Children – Artificial Intelligence brings new risks for our children - Duncan BannerGoogle News: AI
AI NEWS HUBbyEIGENVECTOREigenvector

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning

arXiv cs.CLby Wenxuan Jiang, Yuxin Zuo, Zijian Zhang, Xuecheng Wu, Zining Fan, Wenxuan Liu, Li Chen, Xiaoyu Li, Xuezhi Cao, Xiaolong Jin, Ninghao LiuApril 2, 20262 min read0 views
Source Quiz

arXiv:2604.00438v1 Announce Type: new Abstract: In-Context Reinforcement Learning (ICRL) enables Large Language Models (LLMs) to learn online from external rewards directly within the context window. However, a central challenge in ICRL is reward estimation, as models typically lack access to ground-truths during inference. To address this limitation, we propose Test-Time Rethinking for In-Context Reinforcement Learning (TR-ICRL), a novel ICRL framework designed for both reasoning and knowledge-intensive tasks. TR-ICRL operates by first retrieving the most relevant instances from an unlabeled evaluation set for a given query. During each ICRL iteration, LLM generates a set of candidate answers for every retrieved instance. Next, a pseudo-label is derived from this set through majority voti

Authors:Wenxuan Jiang, Yuxin Zuo, Zijian Zhang, Xuecheng Wu, Zining Fan, Wenxuan Liu, Li Chen, Xiaoyu Li, Xuezhi Cao, Xiaolong Jin, Ninghao Liu

View PDF HTML (experimental)

Abstract:In-Context Reinforcement Learning (ICRL) enables Large Language Models (LLMs) to learn online from external rewards directly within the context window. However, a central challenge in ICRL is reward estimation, as models typically lack access to ground-truths during inference. To address this limitation, we propose Test-Time Rethinking for In-Context Reinforcement Learning (TR-ICRL), a novel ICRL framework designed for both reasoning and knowledge-intensive tasks. TR-ICRL operates by first retrieving the most relevant instances from an unlabeled evaluation set for a given query. During each ICRL iteration, LLM generates a set of candidate answers for every retrieved instance. Next, a pseudo-label is derived from this set through majority voting. This label then serves as a proxy to give reward messages and generate formative feedbacks, guiding LLM through iterative refinement. In the end, this synthesized contextual information is integrated with the original query to form a comprehensive prompt, with the answer determining through a final round of majority voting. TR-ICRL is evaluated on mainstream reasoning and knowledge-intensive tasks, where it demonstrates significant performance gains. Remarkably, TR-ICRL improves Qwen2.5-7B by 21.23% on average on MedQA and even 137.59% on AIME2024. Extensive ablation studies and analyses further validate the effectiveness and robustness of our approach. Our code is available at this https URL.

Comments: 14 pages, 7 figures

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2604.00438 [cs.CL]

(or arXiv:2604.00438v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.00438

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Wenxuan Jiang [view email] [v1] Wed, 1 Apr 2026 03:34:05 UTC (210 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
TR-ICRL: Te…modellanguage mo…announceavailablevaluationreasoningarXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 184 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models