Research Papers research paper arxiv ai artificial-intelligence

Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning

arXivby [Submitted on 26 Nov 2025 (v1), last revised 27 Mar 2026 (this version, v2)]March 30, 20262 min read1 views

arXiv:2511.21075v2 Announce Type: replace-cross Abstract: Aligning Large Language Models (LLMs) with biomedical knowledge requires understanding both concepts and causal mechanisms in scientific reports. Supervised Fine-Tuning (SFT) often fails to capture these logical structures, while Reinforcement Learning (RL) is limited by sparse reward signals. We propose Balanced Fine-Tuning (BFT), a dual-scale post-training method that stabilizes training via confidence-weighted token-level optimization and adaptively emphasizes knowledge-dense hard samples using minimum group confidence. Experiments o — Zhenchao Tang, Fang Wang, Haohuai He, Jiale Zhou, Tianxu Lv, Jun Zhu, Shouzhi Chen, Minghao Yang, Yu Wang, Jiayang Wu, Yidong Song, Yaokun Li, Jiehui Huang, Dawei Huang, Zhi Song, Jianhua Yao

Authors:Zhenchao Tang, Fang Wang, Haohuai He, Jiale Zhou, Tianxu Lv, Jun Zhu, Shouzhi Chen, Minghao Yang, Yu Wang, Jiayang Wu, Yidong Song, Yaokun Li, Jiehui Huang, Dawei Huang, Zhi Song, Jianhua Yao

View PDF HTML (experimental)

Abstract:Aligning Large Language Models (LLMs) with biomedical knowledge requires understanding both concepts and causal mechanisms in scientific reports. Supervised Fine-Tuning (SFT) often fails to capture these logical structures, while Reinforcement Learning (RL) is limited by sparse reward signals. We propose Balanced Fine-Tuning (BFT), a dual-scale post-training method that stabilizes training via confidence-weighted token-level optimization and adaptively emphasizes knowledge-dense hard samples using minimum group confidence. Experiments on medical and biological reasoning benchmarks show that BFT consistently outperforms SFT and achieves competitive or superior performance to specialized systems such as GeneAgent. Beyond improving generative accuracy, BFT enhances the fidelity of LLM-generated biomedical entity descriptions, such that their embeddings produced by standard encoders outperform those from domain-specific biological foundation models. This enables a single post-trained LLM to support both reasoning generation and representation-based biological analysis. Overall, BFT provides a concise and effective framework for aligning LLMs with biomedical knowledge while bridging generative and representational capabilities.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2511.21075 [cs.LG]

(or arXiv:2511.21075v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2511.21075

arXiv-issued DOI via DataCite

Submission history

From: Zhenchao Tang [view email] [v1] Wed, 26 Nov 2025 05:34:26 UTC (5,636 KB) [v2] Fri, 27 Mar 2026 03:36:42 UTC (4,721 KB)

Original source

arXiv

https://arxiv.org/abs/2511.21075

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning , I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed LLMs. The core idea: instead of monitoring input embeddings (which is what existing tools do), we monitor the statistical manifold of the model’s output distributions using Fisher-Rao geodesic distance. We then run adaptive CUSUM (Page-Hinkley) on the resulting z-score stream to catch slow drift that per-request spike detection misses entirely. The methodology is grounded in published work on information geometry (Figshare, DOIs available). We’ve validated the signal on real OpenAI API logprobs, CUSUM caught gradual domain drift in 7 steps with zero false alarms during warmup, while spike detection missed it entirely. If anyone with cs.LG endorsement is

Reddit r/MachineLearning

1mabout 3 hours ago

CountriesLive

How can Beijing attract top-tier Chinese AI professionals based abroad?

Beijing should shift its strategy and improve ways to attract and retain top Chinese AI professionals as America’s accelerating integration of artificial intelligence into military and national security systems puts such talent in a bind. As geopolitical tensions rise, many highly skilled Chinese researchers working at US tech and research institutions are confronting a painful dilemma, according to Dai Mingjie, a researcher at the Institute of Public Policy at the Guangzhou-based South China...

SCMP Tech (Asia AI)

1mabout 1 hour ago

ProductsLive

Anthropic says Claude subscriptions will no longer support OpenClaw because it puts an 'outsized strain' on systems

Why It Matters The decision by Anthropic to stop supporting OpenClaw for Claude subscriptions is significant because it highlights the challenges of integrating third-party tools with AI systems. According to a report from Business Insider, Anthropic cited the "outsized strain" that tools like OpenClaw put on their systems as the reason for this move. This strain is likely due to the additional computational resources required to support these tools, which can impact the overall performance and reliability of the AI system. The impact of this decision will be felt by users who rely on OpenClaw to enhance their experience with Claude subscriptions. OpenClaw's founder has already expressed disappointment, stating that cutting support would be "a loss." This reaction is understandable, given

Dev.to AI

3mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 241 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Reddit r/MachineLearning

1mabout 3 hours ago

Research PapersRecent

How AI Is Re‑Architecting Industrial Procurement and Supply Chain - Emerj Artificial Intelligence Research

How AI Is Re‑Architecting Industrial Procurement and Supply Chain Emerj Artificial Intelligence Research

GNews AI manufacturing

1m1 day ago

Research PapersFresh

Towards end-to-end automation of AI research

Article URL: https://www.nature.com/articles/s41586-026-10265-5 Comments URL: https://news.ycombinator.com/item?id=47645696 Points: 3 # Comments: 0

Hacker News AI Top

1mabout 3 hours ago

Research PapersFresh

[D] KDD Review Discussion

KDD 2026 (Feb Cycle) reviews will release today (4-April AoE), This thread is open to discuss about reviews and importantly celebrate successful reviews. Let us all remember that review system is noisy and we all suffer from it and this doesn't define our research impact. Let's all prioritise reviews which enhance our papers. Feel free to discuss your experiences submitted by /u/BomsDrag [link] [comments]

Reddit r/MachineLearning

1mabout 10 hours ago