Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: ClaudeHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV Community38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI SafetyNH:STA S01E02 OpenPGP.jsDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: ClaudeHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV Community38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI SafetyNH:STA S01E02 OpenPGP.jsDEV Community

OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models

arXivby [Submitted on 30 Apr 2025 (v1), last revised 30 Mar 2026 (this version, v2)]March 31, 20262 min read1 views
Source Quiz

arXiv:2505.01448v2 Announce Type: replace Abstract: Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and direct audio-visual alignment and fusion, which limits their capability to generalize to new, unseen situations. In this paper, we propose OpenAVS, a novel training-free language-based approach that, for the first time, effectively aligns audio and visual modalities using text as a proxy for open-vocabulary Audio-Visual Segmentation (AVS). Equipped wi — Shengkai Chen, Yifang Yin, Jinming Cao, Shili Xiang, Zhenguang Liu, Roger Zimmermann

View PDF HTML (experimental)

Abstract:Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and direct audio-visual alignment and fusion, which limits their capability to generalize to new, unseen situations. In this paper, we propose OpenAVS, a novel training-free language-based approach that, for the first time, effectively aligns audio and visual modalities using text as a proxy for open-vocabulary Audio-Visual Segmentation (AVS). Equipped with multimedia foundation models, OpenAVS directly infers masks through 1) audio-to-text prompt generation, 2) LLM-guided prompt translation, and 3) text-to-visual sounding object segmentation. The objective of OpenAVS is to establish a simple yet flexible architecture that relies on the most appropriate foundation models by fully leveraging their capabilities to enable more effective knowledge transfer to the downstream AVS task. Moreover, we present a model-agnostic framework OpenAVS-ST that enables the integration of OpenAVS with any advanced supervised AVS model via pseudo-label based self-training. This approach enhances performance by effectively utilizing large-scale unlabeled data when available. Comprehensive experiments on three benchmark datasets demonstrate the superior performance of OpenAVS. It surpasses existing unsupervised, zero-shot, and few-shot AVS methods by a significant margin, achieving absolute performance gains of approximately 9.4% and 10.9% in mIoU and F-score, respectively, in challenging scenarios.

Comments: Accepted by ICME 2026

Subjects:

Machine Learning (cs.LG); Multimedia (cs.MM)

Cite as: arXiv:2505.01448 [cs.LG]

(or arXiv:2505.01448v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2505.01448

arXiv-issued DOI via DataCite

Submission history

From: Shengkai Chen [view email] [v1] Wed, 30 Apr 2025 01:52:10 UTC (1,212 KB) [v2] Mon, 30 Mar 2026 06:09:21 UTC (1,489 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
OpenAVS: Tr…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 159 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers