Releases transformer announce feature insight global arxiv

Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers

arXiv cs.IRby Kawtar Zaher, Olivier Buisson, Alexis JolyApril 2, 20262 min read0 views

arXiv:2604.00809v1 Announce Type: cross Abstract: Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an object category relying solely on the initial query and the user's Relevance Feedback, with no prior labels. The retrieval process is formulated as a binary classification task, where the system continuously learns to distinguish between relevant and non-relevant images to the query, through iterative user interaction. This interaction is guided by an Active Learning loop: at each iteration, the system selects informative sample

View PDF HTML (experimental)

Abstract:Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an object category relying solely on the initial query and the user's Relevance Feedback, with no prior labels. The retrieval process is formulated as a binary classification task, where the system continuously learns to distinguish between relevant and non-relevant images to the query, through iterative user interaction. This interaction is guided by an Active Learning loop: at each iteration, the system selects informative samples for user annotation, thereby refining the retrieval performance. This task is particularly challenging in multi-object datasets, where the object of interest may occupy only a small region of the image within a complex, cluttered scene. Unlike object-centered settings where global descriptors often suffice, multi-object images require more adapted, localized descriptors. In this work, we formulate and revisit the Human-in-the-Loop Object Retrieval task by leveraging pre-trained ViT representations, and addressing key design questions, including which object instances to consider in an image, what form the annotations should take, how Active Selection should be applied, and which representation strategies best capture the object's features. We compare several representation strategies across multi-object datasets highlighting trade-offs between capturing the global context and focusing on fine-grained local object details. Our results offer practical insights for the design of effective interactive retrieval pipelines based on Active Learning for object class retrieval.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

Cite as: arXiv:2604.00809 [cs.CV]

(or arXiv:2604.00809v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2604.00809

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Kawtar Zaher [view email] [v1] Wed, 1 Apr 2026 12:18:17 UTC (1,310 KB)

Original source

arXiv cs.IR

https://arxiv.org/abs/2604.00809

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

transformerannouncefeature

ProductsFresh

Google Pixel March Feature Drop: 6 Hidden Gems Most Users Haven’t Noticed - nokiapoweruser.com

Google Pixel March Feature Drop: 6 Hidden Gems Most Users Haven’t Noticed nokiapoweruser.com

Google News: Gemini

1mabout 9 hours ago

ProductsLive

The Pre-Flight Checklist: 7 Things I Verify Before Sending Any Prompt to Production

You wouldn't deploy code without running tests. So why are you sending prompts to production without checking them first? After shipping dozens of AI-powered features, I've settled on a 7-item pre-flight checklist that catches most problems before they reach users. Here it is. 1. Input Boundaries Does the prompt handle edge cases in the input? Empty strings Extremely long inputs (token overflow) Unexpected formats (JSON when expecting plain text) Quick test: Feed it the worst input you can imagine. If it degrades gracefully, you're good. 2. Output Format Lock Is the expected output format explicitly stated in the prompt? Bad: "Summarize this article." Good: "Summarize this article in exactly 3 bullet points, each under 20 words." Without format constraints, you get different shapes every r

Dev.to AI

3m44 minutes ago

ProductsFresh

Data Reduction

More data doesn’t always mean better insights . In fact, excessive data storage can cripple your operations, inflate costs, and slow down decision-making. Introduction In today’s data-driven world, organizations are drowning in information. Every transaction, customer interaction, and operational process generates data — terabytes upon terabytes of it. But here’s the paradox: more data doesn’t always mean better insights . In fact, excessive data storage can cripple your operations, inflate costs, and slow down decision-making. Enter data reduction — a strategic approach to managing data volume without sacrificing the information you actually need. Data Reduction? Data reduction is the process of deliberately limiting the amount of data your organization stores by eliminating redundancy ,

Towards AI

7mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

ReleasesLive

Hong Kong’s John Lee pledges to expand after-school care for low-income families

Hong Kong leader John Lee Ka-chiu has promised to allow more children from low-income families to enjoy after-school care on campus by further expanding a government scheme following positive feedback from participants. Social workers and educators welcomed the initiative on Saturday, proposing the government widen the scheme’s coverage to include more pupils and encourage more schools to join through subsidies. The scheme, launched as part of a government’s targeted measures to tackle poverty,...

SCMP Tech (Asia AI)

1mabout 1 hour ago

ReleasesLive

1 dead in Peru football stadium tragedy, dozens injured

At least one person died and 60 others were injured in what seemed to be a crush of fans at a popular football team’s stadium in the Peruvian capital on Friday night, authorities reported. Police said officers rescued people trapped as a result of an influx of Alianza Lima fans in the south stands of the Alejandro Villanueva Stadium the night before their team’s scheduled game against their biggest rival, Universitario. Earlier, the Ministry of Health had reported the collapse of a wall inside...

SCMP Tech (Asia AI)

1m16 minutes ago

Releases

Realbotix Provides AI Humanoid Robot Delivery Update - Business Wire

Realbotix Provides AI Humanoid Robot Delivery Update Business Wire

Google News - AI robotics

1m3 days ago

ReleasesFresh

LangChain Just Released Deep Agents — And It Changes How You Build AI Systems

Most people are still hand-crafting agent loops in LangGraph. Deep Agents is a higher-level answer to that — and it’s more opinionated than you’d expect. 1.1 Deep agents in action There’s a pattern I’ve watched repeat itself across almost every team that gets serious about building agents. First, they try LangChain chains. Works fine for simple pipelines. Then the task gets complex — needs tool calls, needs to loop, needs to handle variable-length outputs — and chains stop being enough. So they reach for LangGraph, and suddenly they’re writing state schemas, conditional edges, and graph compilation logic before they’ve even gotten to the actual problem. It’s not that LangGraph is bad. It’s extremely powerful. But it’s a runtime — a low-level primitive — and most people are using it as if i

Towards AI

10mabout 3 hours ago