Releases model language model announce version open-source policy

UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates

arXiv cs.IRby Yupei Yang, Lin Yang, Wanxi Deng, Lin Qu, Shikui Tu, Lei XuApril 1, 20262 min read0 views

arXiv:2603.29897v1 Announce Type: new Abstract: Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal reranking remains challenging, particularly when the candidate set contains hybrid text and image items. A key difficulty is the modality gap: a text reranker is intrinsically closer to text candidates than to image candidates, leading to biased and suboptimal cross-modal ranking. Vision-language models (VLMs) mitigate this gap through strong cross-modal alignment and have recently been adopted to build multimodal rerankers. However, most VLM-based rerankers encode all candidates as images, and treating text as images introduces substantial computational overhead. Meanwhile, existing open-source multimodal r

View PDF HTML (experimental)

Abstract:Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal reranking remains challenging, particularly when the candidate set contains hybrid text and image items. A key difficulty is the modality gap: a text reranker is intrinsically closer to text candidates than to image candidates, leading to biased and suboptimal cross-modal ranking. Vision-language models (VLMs) mitigate this gap through strong cross-modal alignment and have recently been adopted to build multimodal rerankers. However, most VLM-based rerankers encode all candidates as images, and treating text as images introduces substantial computational overhead. Meanwhile, existing open-source multimodal rerankers are typically trained on general-domain data and often underperform in domain-specific scenarios. To address these limitations, we propose UniRank, a VLM-based reranking framework that natively scores and orders hybrid text-image candidates without any modality conversion. Building on this hybrid scoring interface, UniRank provides an end-to-end domain adaptation pipeline that includes: (1) an instruction-tuning stage that learns calibrated cross-modal relevance scoring by mapping label-token likelihoods to a unified scalar score; and (2) a hard-negative-driven preference alignment stage that constructs in-domain pairwise preferences and performs query-level policy optimization through reinforcement learning from human feedback (RLHF). Extensive experiments on scientific literature retrieval and design patent search demonstrate that UniRank consistently outperforms state-of-the-art baselines, improving Recall@1 by 8.9% and 7.3%, respectively.

Subjects:

Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29897 [cs.IR]

(or arXiv:2603.29897v1 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2603.29897

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yupei Yang [view email] [v1] Sun, 8 Feb 2026 12:39:19 UTC (1,152 KB)

Original source

arXiv cs.IR

https://arxiv.org/abs/2603.29897

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

ModelsFresh

AI World Models: What Leaders Should Know - WSJ

AI World Models: What Leaders Should Know WSJ

Google News: Machine Learning

1mabout 5 hours ago

ModelsLive

Wonder 3D generative AI model launches - Autodesk News

Wonder 3D generative AI model launches Autodesk News

Google News: Generative AI

1mabout 1 hour ago

ModelsLive

Microsoft Goes Beyond LLMs With New Voice, Image Models - AI Business

Microsoft Goes Beyond LLMs With New Voice, Image Models AI Business

Google News: LLM

1m36 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

Releases

Tencent Games Showcases Tech Advancements Shaping Future Player Experience at GDC 2026 - PR Newswire

Tencent Games Showcases Tech Advancements Shaping Future Player Experience at GDC 2026 PR Newswire

Google News - Tencent AI

1m28 days ago

Releases

Tencent, Zhipu shares jump after launching OpenClaw-powered AI agents (TCEHY:OTCMKTS) - Seeking Alpha

Tencent, Zhipu shares jump after launching OpenClaw-powered AI agents (TCEHY:OTCMKTS) Seeking Alpha

Google News - Tencent AI

1m24 days ago

ReleasesLive

Debunking Myths on the National Security Impact of Warrants for U.S. Person Queries in 2026

Co-authored with Gene Schaerr, General Counsel at the Project on Privacy and Surveillance Accountability [PDF Version] Warrantless queries of Americans’ communications obtained via Section 702 of the Foreign Intelligence Surveillance Act (“FISA 702”) are antagonistic to the basic principle of the Fourth Amendment. Deliberately seeking to read Americans’ private communications – but without ever showing evidence [ ] The post Debunking Myths on the National Security Impact of Warrants for U.S. Person Queries in 2026 appeared first on Center for Democracy and Technology .

Center for Democracy & Technology

1mabout 1 hour ago

ReleasesFresh

AI Agent Evaluation: Building Reliable Systems Beyond Simple Testing

Your customer service agent routes 2,000 queries daily. During testing, it resolved 85 percent of requests correctly. Three weeks after launch, customer satisfaction dropped 12 percent and support tickets escalated 40 percent faster than baseline. Your logs show successful API calls, normal latency and clean status codes across the board. The metrics say everything works. [ ] The post AI Agent Evaluation: Building Reliable Systems Beyond Simple Testing appeared first on Comet .

Comet ML Blog

1mabout 2 hours ago