Live
Black Hat USADark ReadingBlack Hat AsiaAI Business[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?Reddit r/MachineLearning🔥 Alishahryar1/free-claude-codeGitHub Trending🔥 roboflow/supervisionGitHub Trending🔥 zai-org/GLM-OCRGitHub Trending🔥 MervinPraison/PraisonAIGitHub Trending🔥 sponsors/asgeirtjGitHub TrendingHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AIClaude has Angst. What can we do?LessWrongSame Prompt. Different Answers Every Time. Here's How I Fixed It.Dev.to AICan AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global InvestorsDev.to AIYour Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation TestingDev.to AII Broke My Multi-Agent Pipeline on Purpose. All 3 Failures Were Silent.Dev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI Business[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?Reddit r/MachineLearning🔥 Alishahryar1/free-claude-codeGitHub Trending🔥 roboflow/supervisionGitHub Trending🔥 zai-org/GLM-OCRGitHub Trending🔥 MervinPraison/PraisonAIGitHub Trending🔥 sponsors/asgeirtjGitHub TrendingHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AIClaude has Angst. What can we do?LessWrongSame Prompt. Different Answers Every Time. Here's How I Fixed It.Dev.to AICan AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global InvestorsDev.to AIYour Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation TestingDev.to AII Broke My Multi-Agent Pipeline on Purpose. All 3 Failures Were Silent.Dev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Streamlined Open-Vocabulary Human-Object Interaction Detection

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.27500v1 Announce Type: new Abstract: Open-vocabulary human-object interaction (HOI) detection aims to localize and recognize all human-object interactions in an image, including those unseen during training. Existing approaches usually rely on the collaboration between a conventional HOI detector and a Vision-Language Model (VLM) to recognize unseen HOI categories. However, feature fusion in this paradigm is challenging due to significant gaps in cross-model representations. To address this issue, we introduce SL-HOI, a StreamLined open-vocabulary HOI detection framework based solel — Chang Sun, Dongliang Liao, Changxing Ding

View PDF HTML (experimental)

Abstract:Open-vocabulary human-object interaction (HOI) detection aims to localize and recognize all human-object interactions in an image, including those unseen during training. Existing approaches usually rely on the collaboration between a conventional HOI detector and a Vision-Language Model (VLM) to recognize unseen HOI categories. However, feature fusion in this paradigm is challenging due to significant gaps in cross-model representations. To address this issue, we introduce SL-HOI, a StreamLined open-vocabulary HOI detection framework based solely on the powerful DINOv3 model. Our design leverages the complementary strengths of DINOv3's components: its backbone for fine-grained localization and its text-aligned vision head for open-vocabulary interaction classification. Moreover, to facilitate smooth cross-attention between the interaction queries and the vision head's output, we propose first feeding both the interaction queries and the backbone image tokens into the vision head, effectively bridging their representation gaps. All DINOv3 parameters in our approach are frozen, with only a small number of learnable parameters added, allowing a fast adaptation to the HOI detection task. Extensive experiments show that SL-HOI achieves state-of-the-art performance on both the SWiG-HOI and HICO-DET benchmarks, demonstrating the effectiveness of our streamlined model architecture. Code is available at this https URL.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27500 [cs.CV]

(or arXiv:2603.27500v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27500

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chang Sun [view email] [v1] Sun, 29 Mar 2026 03:31:56 UTC (4,647 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Streamlined…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 171 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers