Research Papers research paper arxiv ai artificial-intelligence

MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation

arXivMarch 30, 202610 min read0 views

arXiv:2512.16145v2 Announce Type: replace-cross Abstract: Medical report generation aims to automatically produce radiology-style reports from medical images, supporting efficient and accurate clinical decision-making.However, existing approaches predominately rely on token-level likelihood training, which favors local lexical matching and leaves clinical correctness under-specified in the training objective. This behavior can be attributed to token-level likelihood optimization, which rewards surface-form agreement and therefore fails to directly encode constraints on medically accurate findi — Pengyu Wang, Shuchang Ye, Usman Naseem, Jinman Kim

View PDF HTML (experimental)

Abstract:Medical report generation aims to automatically produce radiology-style reports from medical images, supporting efficient and accurate clinical this http URL, existing approaches predominately rely on token-level likelihood training, which favors local lexical matching and leaves clinical correctness under-specified in the training objective. This behavior can be attributed to token-level likelihood optimization, which rewards surface-form agreement and therefore fails to directly encode constraints on medically accurate findings. To address this objective mismatch, we introduce a semantic-driven reinforcement learning (SRL) framework for medical report generation, named MRG-R1, which directly optimizes report-level clinical correctness rather than token-level likelihood. The key module is a clinically grounded report-level reward function, which reinforces semantic agreement in clinically relevant findings between generated and reference reports, thereby enabling learning signals that explicitly constrain medical correctness beyond surface linguistic alignment. Our evaluations show that the proposed framework improves the accuracy and coverage of clinically relevant findings in generated reports, and that MRG-R1 achieves state-of-the-art clinical efficacy on the IU X-Ray and MIMIC-CXR benchmark datasets.

Comments: 10 pages

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2512.16145 [cs.CL]

(or arXiv:2512.16145v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2512.16145

arXiv-issued DOI via DataCite

Submission history

From: Pengyu Wang [view email] [v1] Thu, 18 Dec 2025 03:57:55 UTC (3,874 KB) [v2] Fri, 27 Mar 2026 07:21:26 UTC (5,039 KB)

Original source

arXiv

https://arxiv.org/abs/2512.16145

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ProductsFresh

Why Gaussian Diffusion Models Fail on Discrete Data?

arXiv:2604.02028v1 Announce Type: new Abstract: Diffusion models have become a standard approach for generative modeling in continuous domains, yet their application to discrete data remains challenging. We investigate why Gaussian diffusion models with the DDPM solver struggle to sample from discrete distributions that are represented as a mixture of delta-distributions in the continuous space. Using a toy Random Hierarchy Model, we identify a critical sampling interval in which the density of noisified data becomes multimodal. In this regime, DDPM occasionally enters low-density regions between modes producing out-of-distribution inputs for the model and degrading sample quality. We show that existing heuristics, including self-conditioning and a solver we term q-sampling, help alleviate

arXiv cs.CL

1mabout 6 hours ago

Frontier ResearchLive

A Yale economist says AGI won t automate most jobs—because they re not worth the trouble

Pascual Restrepo's new NBER paper argues it's not about what AI can do. It's about what AI will bother doing—and most human work doesn't make the cut.

Fortune Tech

1mabout 2 hours ago

Market News

BigBear.ai's UAE Strategy: Can Overseas Wins Boost Revenues? - Zacks Investment Research

BigBear.ai's UAE Strategy: Can Overseas Wins Boost Revenues? Zacks Investment Research

GNews AI UAE

1m2 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 178 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study - Research at Google

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study Research at Google

GNews AI Israel

1m24 days ago

Research PapersFresh

How to measure the optimality of word or gesture order with respect to the principle of swap distance minimization

arXiv:2604.01938v1 Announce Type: new Abstract: The structure of all the permutations of a sequence can be represented as a permutohedron, a graph where vertices are permutations and two vertices are linked if a swap of adjacent elements in the permutation of one of the vertices produces the permutation of the other vertex. It has been hypothesized that word orders in languages minimize the swap distance in the permutohedron: given a source order, word orders that are closer in the permutohedron should be less costly and thus more likely. Here we explain how to measure the degree of optimality of word order variation with respect to swap distance minimization. We illustrate the power of our novel mathematical framework by showing that crosslinguistic gestures are at least $77\%$ optimal. I

arXiv cs.CL

2mabout 6 hours ago

Research PapersFresh

Beyond Detection: Ethical Foundations for Automated Dyslexic Error Attribution

arXiv:2604.01853v1 Announce Type: new Abstract: Dyslexic spelling errors exhibit systematic phonological and orthographic patterns that distinguish them from the errors produced by typically developing writers. While this observation has motivated dyslexic-specific spell-checking and assistive writing tools, prior work has focused predominantly on error correction rather than attribution, and has largely neglected the ethical risks. The risk of harmful labelling, covert screening, algorithmic bias, and institutional misuse that automated classification of learners entails requires the development of robust ethical and legal frameworks for research in this area. This paper addresses both gaps. We formulate dyslexic error attribution as a binary classification task. Given a misspelt word and

arXiv cs.CL

2mabout 6 hours ago

Research PapersFresh

Summit Urges Culturally Rooted Digital Wellness, Ethical AI - LEADERSHIP Newspapers

Summit Urges Culturally Rooted Digital Wellness, Ethical AI LEADERSHIP Newspapers

GNews AI ethics

1mabout 4 hours ago