Live
Black Hat USADark ReadingBlack Hat AsiaAI Business1 dead in Peru football stadium tragedy, dozens injuredSCMP Tech (Asia AI)🔥 imbue-ai/mngrGitHub Trending🔥 HKUDS/LightRAGGitHub Trending🔥 ml-explore/mlx-lmGitHub Trending🔥 block/gooseGitHub TrendingHow to Automate Your Daily Workflow with ChatGPT and ZapierDev.to AII've Interviewed 50 AI-Native Junior Devs This Year. Here's the Brutal Truth.Dev.to AI$47M Deepfake Fraud Ring Exposes a Blind Spot in Evidence WorkflowsDev.to AIWe tested structured ontology vs Markdown+RAG for AI agents — "why?" recall was 0% vs 100%Dev.to AICodex now offers more flexible pricing for teamsDev.to AIRoborock Saros 20 vs Saros 10R: 36,000 Pa suction dominates but Sonic version loomsDev.to AII Built a Pokédex for AI Coding CompanionsDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI Business1 dead in Peru football stadium tragedy, dozens injuredSCMP Tech (Asia AI)🔥 imbue-ai/mngrGitHub Trending🔥 HKUDS/LightRAGGitHub Trending🔥 ml-explore/mlx-lmGitHub Trending🔥 block/gooseGitHub TrendingHow to Automate Your Daily Workflow with ChatGPT and ZapierDev.to AII've Interviewed 50 AI-Native Junior Devs This Year. Here's the Brutal Truth.Dev.to AI$47M Deepfake Fraud Ring Exposes a Blind Spot in Evidence WorkflowsDev.to AIWe tested structured ontology vs Markdown+RAG for AI agents — "why?" recall was 0% vs 100%Dev.to AICodex now offers more flexible pricing for teamsDev.to AIRoborock Saros 20 vs Saros 10R: 36,000 Pa suction dominates but Sonic version loomsDev.to AII Built a Pokédex for AI Coding CompanionsDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers

arXiv cs.IRby Kawtar Zaher, Olivier Buisson, Alexis JolyApril 2, 20262 min read0 views
Source Quiz

arXiv:2604.00809v1 Announce Type: cross Abstract: Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an object category relying solely on the initial query and the user's Relevance Feedback, with no prior labels. The retrieval process is formulated as a binary classification task, where the system continuously learns to distinguish between relevant and non-relevant images to the query, through iterative user interaction. This interaction is guided by an Active Learning loop: at each iteration, the system selects informative sample

View PDF HTML (experimental)

Abstract:Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an object category relying solely on the initial query and the user's Relevance Feedback, with no prior labels. The retrieval process is formulated as a binary classification task, where the system continuously learns to distinguish between relevant and non-relevant images to the query, through iterative user interaction. This interaction is guided by an Active Learning loop: at each iteration, the system selects informative samples for user annotation, thereby refining the retrieval performance. This task is particularly challenging in multi-object datasets, where the object of interest may occupy only a small region of the image within a complex, cluttered scene. Unlike object-centered settings where global descriptors often suffice, multi-object images require more adapted, localized descriptors. In this work, we formulate and revisit the Human-in-the-Loop Object Retrieval task by leveraging pre-trained ViT representations, and addressing key design questions, including which object instances to consider in an image, what form the annotations should take, how Active Selection should be applied, and which representation strategies best capture the object's features. We compare several representation strategies across multi-object datasets highlighting trade-offs between capturing the global context and focusing on fine-grained local object details. Our results offer practical insights for the design of effective interactive retrieval pipelines based on Active Learning for object class retrieval.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

Cite as: arXiv:2604.00809 [cs.CV]

(or arXiv:2604.00809v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2604.00809

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Kawtar Zaher [view email] [v1] Wed, 1 Apr 2026 12:18:17 UTC (1,310 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

transformerannouncefeature

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Revisiting …transformerannouncefeatureinsightglobalarxivarXiv cs.IR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases