Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingMarc Andreessen Is Right That AI Isn't Killing Jobs. Interest Rate Hikes AreHacker News AI TopHollywood Adapts To AI—Outsiders Master It—Collaboration Is Inevitable - ForbesGoogle News: Generative AIInvincible Season 4 Is Doing Right By DebbieGizmodoAustralia, Anthropic sign AI safety & research deal - IT Brief AustraliaGoogle News: AI SafetyShow HN: AI tax filing – upload W-2s and 1099s, get completed IRS forms backHacker News AI TopSupporting Google Account username change in your appGoogle Developers BlogDOT’s motor safety division stays clear of AI chatbot allure - FedScoopGoogle News: AI SafetyReporting potholes with an ESP32, LoRA, and AIHacker News AI TopNCSA, MechSE Develop GenAI Workflow for Metamaterial Design on DeltaAI - hpcwire.comGoogle News: Generative AIMicrosoft Goes Beyond LLMs With New Voice, Image Models - AI BusinessGoogle News: LLMOrallexa – AI Trading SystemHacker News AI TopBlack Hat USADark ReadingBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingMarc Andreessen Is Right That AI Isn't Killing Jobs. Interest Rate Hikes AreHacker News AI TopHollywood Adapts To AI—Outsiders Master It—Collaboration Is Inevitable - ForbesGoogle News: Generative AIInvincible Season 4 Is Doing Right By DebbieGizmodoAustralia, Anthropic sign AI safety & research deal - IT Brief AustraliaGoogle News: AI SafetyShow HN: AI tax filing – upload W-2s and 1099s, get completed IRS forms backHacker News AI TopSupporting Google Account username change in your appGoogle Developers BlogDOT’s motor safety division stays clear of AI chatbot allure - FedScoopGoogle News: AI SafetyReporting potholes with an ESP32, LoRA, and AIHacker News AI TopNCSA, MechSE Develop GenAI Workflow for Metamaterial Design on DeltaAI - hpcwire.comGoogle News: Generative AIMicrosoft Goes Beyond LLMs With New Voice, Image Models - AI BusinessGoogle News: LLMOrallexa – AI Trading SystemHacker News AI Top
AI NEWS HUBbyEIGENVECTOREigenvector

UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates

arXiv cs.IRby Yupei Yang, Lin Yang, Wanxi Deng, Lin Qu, Shikui Tu, Lei XuApril 1, 20262 min read0 views
Source Quiz

arXiv:2603.29897v1 Announce Type: new Abstract: Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal reranking remains challenging, particularly when the candidate set contains hybrid text and image items. A key difficulty is the modality gap: a text reranker is intrinsically closer to text candidates than to image candidates, leading to biased and suboptimal cross-modal ranking. Vision-language models (VLMs) mitigate this gap through strong cross-modal alignment and have recently been adopted to build multimodal rerankers. However, most VLM-based rerankers encode all candidates as images, and treating text as images introduces substantial computational overhead. Meanwhile, existing open-source multimodal r

View PDF HTML (experimental)

Abstract:Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal reranking remains challenging, particularly when the candidate set contains hybrid text and image items. A key difficulty is the modality gap: a text reranker is intrinsically closer to text candidates than to image candidates, leading to biased and suboptimal cross-modal ranking. Vision-language models (VLMs) mitigate this gap through strong cross-modal alignment and have recently been adopted to build multimodal rerankers. However, most VLM-based rerankers encode all candidates as images, and treating text as images introduces substantial computational overhead. Meanwhile, existing open-source multimodal rerankers are typically trained on general-domain data and often underperform in domain-specific scenarios. To address these limitations, we propose UniRank, a VLM-based reranking framework that natively scores and orders hybrid text-image candidates without any modality conversion. Building on this hybrid scoring interface, UniRank provides an end-to-end domain adaptation pipeline that includes: (1) an instruction-tuning stage that learns calibrated cross-modal relevance scoring by mapping label-token likelihoods to a unified scalar score; and (2) a hard-negative-driven preference alignment stage that constructs in-domain pairwise preferences and performs query-level policy optimization through reinforcement learning from human feedback (RLHF). Extensive experiments on scientific literature retrieval and design patent search demonstrate that UniRank consistently outperforms state-of-the-art baselines, improving Recall@1 by 8.9% and 7.3%, respectively.

Subjects:

Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29897 [cs.IR]

(or arXiv:2603.29897v1 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2603.29897

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yupei Yang [view email] [v1] Sun, 8 Feb 2026 12:39:19 UTC (1,152 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
UniRank: En…modellanguage mo…announceversionopen-sourcepolicyarXiv cs.IR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases