Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessPolymarket Kalshi ArbitrageDEV CommunityBMAD-Method: AI-Driven Agile Development That Actually Works (Part 1: Core Framework)DEV CommunityBehind the Scenes: How Database Traffic Control WorksDEV CommunityWe Built the Same Agent Three Times Before It WorkedDEV CommunityWhy Cybersecurity Compliance Is Now a Strategic Business Asset — Not Just a Legal ObligationDEV CommunityScan Any Document to a Searchable PDF For Free, Right in Your BrowserDEV CommunityAI Writes Better UI Without React Than With ItDEV CommunityScan Any Document to a Searchable PDF — For Free, Right in Your BrowserDEV CommunityWhy LLM orchestration is broken (and how cryptographic agent identities fix it)DEV CommunityBeyond the Hype: A Practical Guide to Integrating AI into Your Development WorkflowDEV CommunityBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfakeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessPolymarket Kalshi ArbitrageDEV CommunityBMAD-Method: AI-Driven Agile Development That Actually Works (Part 1: Core Framework)DEV CommunityBehind the Scenes: How Database Traffic Control WorksDEV CommunityWe Built the Same Agent Three Times Before It WorkedDEV CommunityWhy Cybersecurity Compliance Is Now a Strategic Business Asset — Not Just a Legal ObligationDEV CommunityScan Any Document to a Searchable PDF For Free, Right in Your BrowserDEV CommunityAI Writes Better UI Without React Than With ItDEV CommunityScan Any Document to a Searchable PDF — For Free, Right in Your BrowserDEV CommunityWhy LLM orchestration is broken (and how cryptographic agent identities fix it)DEV CommunityBeyond the Hype: A Practical Guide to Integrating AI into Your Development WorkflowDEV CommunityBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfake

Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.28053v1 Announce Type: new Abstract: Preference-based reinforcement learning can learn effective reward functions from comparisons, but its scalability is constrained by the high cost of oracle feedback. Lightweight vision-language embedding (VLE) models provide a cheaper alternative, but their noisy outputs limit their effectiveness as standalone reward generators. To address this challenge, we propose ROVED, a hybrid framework that combines VLE-based supervision with targeted oracle feedback. Our method uses the VLE to generate segment-level preferences and defers to an oracle onl — Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis, Amit Roy-Chowdhury

View PDF HTML (experimental)

Abstract:Preference-based reinforcement learning can learn effective reward functions from comparisons, but its scalability is constrained by the high cost of oracle feedback. Lightweight vision-language embedding (VLE) models provide a cheaper alternative, but their noisy outputs limit their effectiveness as standalone reward generators. To address this challenge, we propose ROVED, a hybrid framework that combines VLE-based supervision with targeted oracle feedback. Our method uses the VLE to generate segment-level preferences and defers to an oracle only for samples with high uncertainty, identified through a filtering mechanism. In addition, we introduce a parameter-efficient fine-tuning method that adapts the VLE with the obtained oracle feedback in order to improve the model over time in a synergistic fashion. This ensures the retention of the scalability of embeddings and the accuracy of oracles, while avoiding their inefficiencies. Across multiple robotic manipulation tasks, ROVED matches or surpasses prior preference-based methods while reducing oracle queries by up to 80%. Remarkably, the adapted VLE generalizes across tasks, yielding cumulative annotation savings of up to 90%, highlighting the practicality of combining scalable embeddings with precise oracle supervision for preference-based RL.

Comments: Accepted at ICRA 2026. Project page:this https URL

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.28053 [cs.LG]

(or arXiv:2603.28053v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28053

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Udita Ghosh [view email] [v1] Mon, 30 Mar 2026 05:33:55 UTC (2,801 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Reducing Or…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers