UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates
arXiv:2603.29897v1 Announce Type: new Abstract: Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal reranking remains challenging, particularly when the candidate set contains hybrid text and image items. A key difficulty is the modality gap: a text reranker is intrinsically closer to text candidates than to image candidates, leading to biased and suboptimal cross-modal ranking. Vision-language models (VLMs) mitigate this gap through strong cross-modal alignment and have recently been adopted to build multimodal rerankers. However, most VLM-based rerankers encode all candidates as images, and treating text as images introduces substantial computational overhead. Meanwhile, existing open-source multimodal r
View PDF HTML (experimental)
Abstract:Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal reranking remains challenging, particularly when the candidate set contains hybrid text and image items. A key difficulty is the modality gap: a text reranker is intrinsically closer to text candidates than to image candidates, leading to biased and suboptimal cross-modal ranking. Vision-language models (VLMs) mitigate this gap through strong cross-modal alignment and have recently been adopted to build multimodal rerankers. However, most VLM-based rerankers encode all candidates as images, and treating text as images introduces substantial computational overhead. Meanwhile, existing open-source multimodal rerankers are typically trained on general-domain data and often underperform in domain-specific scenarios. To address these limitations, we propose UniRank, a VLM-based reranking framework that natively scores and orders hybrid text-image candidates without any modality conversion. Building on this hybrid scoring interface, UniRank provides an end-to-end domain adaptation pipeline that includes: (1) an instruction-tuning stage that learns calibrated cross-modal relevance scoring by mapping label-token likelihoods to a unified scalar score; and (2) a hard-negative-driven preference alignment stage that constructs in-domain pairwise preferences and performs query-level policy optimization through reinforcement learning from human feedback (RLHF). Extensive experiments on scientific literature retrieval and design patent search demonstrate that UniRank consistently outperforms state-of-the-art baselines, improving Recall@1 by 8.9% and 7.3%, respectively.
Subjects:
Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.29897 [cs.IR]
(or arXiv:2603.29897v1 [cs.IR] for this version)
https://doi.org/10.48550/arXiv.2603.29897
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yupei Yang [view email] [v1] Sun, 8 Feb 2026 12:39:19 UTC (1,152 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

Debunking Myths on the National Security Impact of Warrants for U.S. Person Queries in 2026
Co-authored with Gene Schaerr, General Counsel at the Project on Privacy and Surveillance Accountability [PDF Version] Warrantless queries of Americans’ communications obtained via Section 702 of the Foreign Intelligence Surveillance Act (“FISA 702”) are antagonistic to the basic principle of the Fourth Amendment. Deliberately seeking to read Americans’ private communications – but without ever showing evidence [ ] The post Debunking Myths on the National Security Impact of Warrants for U.S. Person Queries in 2026 appeared first on Center for Democracy and Technology .

AI Agent Evaluation: Building Reliable Systems Beyond Simple Testing
Your customer service agent routes 2,000 queries daily. During testing, it resolved 85 percent of requests correctly. Three weeks after launch, customer satisfaction dropped 12 percent and support tickets escalated 40 percent faster than baseline. Your logs show successful API calls, normal latency and clean status codes across the board. The metrics say everything works. [ ] The post AI Agent Evaluation: Building Reliable Systems Beyond Simple Testing appeared first on Comet .




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!