Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessGrammarly’s sloppelganger sagaThe Verge AILinkedIn is secretly scanning your browser for 6,000 extensions, and you weren’t toldThe Next Web AIHigh-Risk Authors — Malicious Accounts — 2026-04-05Dev.to AIAutomating Your Playtest Triage with AIDev.to AIEcosystem Health Index — 2026-04-05Dev.to AIAudit Coverage Report — 2026-04-05Dev.to AIThreat Deep Dive — Attack Categories — 2026-04-05Dev.to AIFastest Growing Skills — Download Surge — 2026-04-05Dev.to AINewly Discovered Skills This Week — 2026-04-05Dev.to AISkill Category Distribution — 2026-04-05Dev.to AIRising Authors — Clean Track Records — 2026-04-05Dev.to AII Made My AI CEO Keep a Public Diary. Here's What 42 Sessions of $0 Revenue Looks Like.Dev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessGrammarly’s sloppelganger sagaThe Verge AILinkedIn is secretly scanning your browser for 6,000 extensions, and you weren’t toldThe Next Web AIHigh-Risk Authors — Malicious Accounts — 2026-04-05Dev.to AIAutomating Your Playtest Triage with AIDev.to AIEcosystem Health Index — 2026-04-05Dev.to AIAudit Coverage Report — 2026-04-05Dev.to AIThreat Deep Dive — Attack Categories — 2026-04-05Dev.to AIFastest Growing Skills — Download Surge — 2026-04-05Dev.to AINewly Discovered Skills This Week — 2026-04-05Dev.to AISkill Category Distribution — 2026-04-05Dev.to AIRising Authors — Clean Track Records — 2026-04-05Dev.to AII Made My AI CEO Keep a Public Diary. Here's What 42 Sessions of $0 Revenue Looks Like.Dev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Nemotron ColEmbed V2: Top-Performing Late Interaction Embedding Models for Visual Document Retrieval

arXiv cs.IRby [Submitted on 3 Feb 2026 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20262 min read1 views
Source Quiz

arXiv:2602.03992v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) systems have been popular for generative applications, powering language models by injecting external knowledge. Companies have been trying to leverage their large catalog of documents (e.g. PDFs, presentation slides) in such RAG pipelines, whose first step is the retrieval component. Dense retrieval has been a popular approach, where embedding models are used to generate a dense representation of the user query that is closer to relevant content embeddings. More recently, VLM-based embedding models have become popular for visual document retrieval, as they preserve visual information and simplify the indexing pipeline compared to OCR text extraction. Motivated by the growing demand for visual document

Authors:Gabriel de Souza P. Moreira, Ronay Ak, Mengyao Xu, Oliver Holworthy, Benedikt Schifferer, Zhiding Yu, Yauhen Babakhin, Radek Osmulski, Jiarui Cai, Ryan Chesler, Bo Liu, Even Oldridge

View PDF HTML (experimental)

Abstract:Retrieval-Augmented Generation (RAG) systems have been popular for generative applications, powering language models by injecting external knowledge. Companies have been trying to leverage their large catalog of documents (e.g. PDFs, presentation slides) in such RAG pipelines, whose first step is the retrieval component. Dense retrieval has been a popular approach, where embedding models are used to generate a dense representation of the user query that is closer to relevant content embeddings. More recently, VLM-based embedding models have become popular for visual document retrieval, as they preserve visual information and simplify the indexing pipeline compared to OCR text extraction. Motivated by the growing demand for visual document retrieval, we introduce Nemotron ColEmbed V2, a family of models that achieve state-of-the-art performance on the ViDoRe benchmarks. We release three variants - with 3B, 4B, and 8B parameters - based on pre-trained VLMs: NVIDIA Eagle 2 with Llama 3.2 3B backbone, Qwen3-VL-4B-Instruct and Qwen3-VL-8B-Instruct, respectively. The 8B model ranks first on the ViDoRe V3 leaderboard as of February 03, 2026, achieving an average NDCG@10 of 63.42. We describe the main techniques used across data processing, training, and post-training - such as cluster-based sampling, hard-negative mining, bidirectional attention, late interaction, and model merging - that helped us build our top-performing models. We also discuss compute and storage engineering challenges posed by the late interaction mechanism and present experiments on how to balance accuracy and storage with lower dimension embeddings.

Comments: Proceedings of the 1st Late Interaction Workshop (LIR) @ ECIR 2026, April 02, 2026

Subjects:

Information Retrieval (cs.IR)

Cite as: arXiv:2602.03992 [cs.IR]

(or arXiv:2602.03992v2 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2602.03992

arXiv-issued DOI via DataCite

Submission history

From: Gabriel De Souza Pereira Moreira [view email] [v1] Tue, 3 Feb 2026 20:26:44 UTC (1,629 KB) [v2] Wed, 1 Apr 2026 13:16:45 UTC (1,630 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Nemotron Co…llamamodellanguage mo…benchmarktrainingreleasearXiv cs.IR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 160 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!