Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers
arXiv:2604.00809v1 Announce Type: cross Abstract: Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an object category relying solely on the initial query and the user's Relevance Feedback, with no prior labels. The retrieval process is formulated as a binary classification task, where the system continuously learns to distinguish between relevant and non-relevant images to the query, through iterative user interaction. This interaction is guided by an Active Learning loop: at each iteration, the system selects informative sample
View PDF HTML (experimental)
Abstract:Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an object category relying solely on the initial query and the user's Relevance Feedback, with no prior labels. The retrieval process is formulated as a binary classification task, where the system continuously learns to distinguish between relevant and non-relevant images to the query, through iterative user interaction. This interaction is guided by an Active Learning loop: at each iteration, the system selects informative samples for user annotation, thereby refining the retrieval performance. This task is particularly challenging in multi-object datasets, where the object of interest may occupy only a small region of the image within a complex, cluttered scene. Unlike object-centered settings where global descriptors often suffice, multi-object images require more adapted, localized descriptors. In this work, we formulate and revisit the Human-in-the-Loop Object Retrieval task by leveraging pre-trained ViT representations, and addressing key design questions, including which object instances to consider in an image, what form the annotations should take, how Active Selection should be applied, and which representation strategies best capture the object's features. We compare several representation strategies across multi-object datasets highlighting trade-offs between capturing the global context and focusing on fine-grained local object details. Our results offer practical insights for the design of effective interactive retrieval pipelines based on Active Learning for object class retrieval.
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Cite as: arXiv:2604.00809 [cs.CV]
(or arXiv:2604.00809v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2604.00809
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Kawtar Zaher [view email] [v1] Wed, 1 Apr 2026 12:18:17 UTC (1,310 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
transformerannouncefeature
The Pre-Flight Checklist: 7 Things I Verify Before Sending Any Prompt to Production
You wouldn't deploy code without running tests. So why are you sending prompts to production without checking them first? After shipping dozens of AI-powered features, I've settled on a 7-item pre-flight checklist that catches most problems before they reach users. Here it is. 1. Input Boundaries Does the prompt handle edge cases in the input? Empty strings Extremely long inputs (token overflow) Unexpected formats (JSON when expecting plain text) Quick test: Feed it the worst input you can imagine. If it degrades gracefully, you're good. 2. Output Format Lock Is the expected output format explicitly stated in the prompt? Bad: "Summarize this article." Good: "Summarize this article in exactly 3 bullet points, each under 20 words." Without format constraints, you get different shapes every r

Data Reduction
More data doesn’t always mean better insights . In fact, excessive data storage can cripple your operations, inflate costs, and slow down decision-making. Introduction In today’s data-driven world, organizations are drowning in information. Every transaction, customer interaction, and operational process generates data — terabytes upon terabytes of it. But here’s the paradox: more data doesn’t always mean better insights . In fact, excessive data storage can cripple your operations, inflate costs, and slow down decision-making. Enter data reduction — a strategic approach to managing data volume without sacrificing the information you actually need. Data Reduction? Data reduction is the process of deliberately limiting the amount of data your organization stores by eliminating redundancy ,
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

Hong Kong’s John Lee pledges to expand after-school care for low-income families
Hong Kong leader John Lee Ka-chiu has promised to allow more children from low-income families to enjoy after-school care on campus by further expanding a government scheme following positive feedback from participants. Social workers and educators welcomed the initiative on Saturday, proposing the government widen the scheme’s coverage to include more pupils and encourage more schools to join through subsidies. The scheme, launched as part of a government’s targeted measures to tackle poverty,...

1 dead in Peru football stadium tragedy, dozens injured
At least one person died and 60 others were injured in what seemed to be a crush of fans at a popular football team’s stadium in the Peruvian capital on Friday night, authorities reported. Police said officers rescued people trapped as a result of an influx of Alianza Lima fans in the south stands of the Alejandro Villanueva Stadium the night before their team’s scheduled game against their biggest rival, Universitario. Earlier, the Ministry of Health had reported the collapse of a wall inside...

LangChain Just Released Deep Agents — And It Changes How You Build AI Systems
Most people are still hand-crafting agent loops in LangGraph. Deep Agents is a higher-level answer to that — and it’s more opinionated than you’d expect. 1.1 Deep agents in action There’s a pattern I’ve watched repeat itself across almost every team that gets serious about building agents. First, they try LangChain chains. Works fine for simple pipelines. Then the task gets complex — needs tool calls, needs to loop, needs to handle variable-length outputs — and chains stop being enough. So they reach for LangGraph, and suddenly they’re writing state schemas, conditional edges, and graph compilation logic before they’ve even gotten to the actual problem. It’s not that LangGraph is bad. It’s extremely powerful. But it’s a runtime — a low-level primitive — and most people are using it as if i




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!