Products announce assistant reasoning arxiv github

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

arXiv cs.CVby [Submitted on 31 Mar 2026]April 2, 20262 min read2 views

🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine you're at a party, and lots of friends are talking and playing.

This new computer game, called Omni-MMSI, helps robots be super good listeners and watchers!

It's like teaching a robot to know who is saying "hello," who is pointing at the yummy cake, and who they are talking to.

Right now, robots sometimes get confused and don't know who is who. But this new game helps them learn to figure out all these things, just like you do!

So, robots can understand parties better and be super helpful friends! Isn't that cool?

arXiv:2604.00267v1 Announce Type: new Abstract: We introduce Omni-MMSI, a new task that requires comprehensive social interaction understanding from raw audio, vision, and speech input. The task involves perceiving identity-attributed social cues (e.g., who is speaking what) and reasoning about the social interaction (e.g., whom the speaker refers to). This task is essential for developing AI assistants that can perceive and respond to human interactions. Unlike prior studies that operate on oracle-preprocessed social cues, Omni-MMSI reflects realistic scenarios where AI assistants must perceive and reason from raw data. However, existing pipelines and multi-modal LLMs perform poorly on Omni-MMSI because they lack reliable identity attribution capabilities, which leads to inaccurate social

View PDF HTML (experimental)

Abstract:We introduce Omni-MMSI, a new task that requires comprehensive social interaction understanding from raw audio, vision, and speech input. The task involves perceiving identity-attributed social cues (e.g., who is speaking what) and reasoning about the social interaction (e.g., whom the speaker refers to). This task is essential for developing AI assistants that can perceive and respond to human interactions. Unlike prior studies that operate on oracle-preprocessed social cues, Omni-MMSI reflects realistic scenarios where AI assistants must perceive and reason from raw data. However, existing pipelines and multi-modal LLMs perform poorly on Omni-MMSI because they lack reliable identity attribution capabilities, which leads to inaccurate social interaction understanding. To address this challenge, we propose Omni-MMSI-R, a reference-guided pipeline that produces identity-attributed social cues with tools and conducts chain-of-thought social reasoning. To facilitate this pipeline, we construct participant-level reference pairs and curate reasoning annotations on top of the existing datasets. Experiments demonstrate that Omni-MMSI-R outperforms advanced LLMs and counterparts on Omni-MMSI. Project page: this https URL.

Comments: Accepted to CVPR 2026. Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2604.00267 [cs.CV]

(or arXiv:2604.00267v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2604.00267

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xinpeng Li [view email] [v1] Tue, 31 Mar 2026 21:49:52 UTC (2,141 KB)

Original source

arXiv cs.CV

https://arxiv.org/abs/2604.00267

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announceassistantreasoning

Open Source AILive

OpenClaw Changed How We Use AI. KiloClaw Made It Effortless to Get Started

OpenClaw is a powerful open-source AI agent, but self-hosting it is a pain. KiloClaw is OpenClaw fully hosted and managed by Kilo — sign up, connect your chat apps, and your agent is running in about a minute. No Docker, no YAML, no server babysitting. People are using it for personalized morning briefs, inbox digests, auto-building CRMs, browser automation, GitHub triage, and more. Hosting is $8/month with a 7-day free trial, inference runs through Kilo Gateway at zero markup across 500+ models, and it's free for open-source maintainers. Read All

Hackernoon AI

1m19 minutes ago

ReleasesFresh

Orientation Matters: Learning Radiation Patterns of Multi-Rotor UAVs In-Flight to Enhance Communication Availability Modeling

arXiv:2604.02827v1 Announce Type: new Abstract: The paper presents an approach for learning antenna Radiation Patterns (RPs) of a pair of heterogeneous quadrotor Uncrewed Aerial Vehicles (UAVs) by calibration flight data. RPs are modeled either as a Spherical Harmonics series or as a weighted average over inducing samples. Linear regression of polynomial coefficients simultaneously decouples the two independent UAVs' RPs. A joint calibration trajectory exploits available flight time in an obstacle-free anechoic altitude. Evaluation on a real-world dataset demonstrates the feasibility of learning both radiation patterns, achieving 3.6 dB RMS error, the measurement noise level. The proposed RP learning and decoupling can be exploited in rapid recalibration upon payload changes, thereby enabl

arXiv cs.RO

1mabout 3 hours ago

ModelsFresh

Goal-Conditioned Neural ODEs with Guaranteed Safety and Stability for Learning-Based All-Pairs Motion Planning

arXiv:2604.02821v1 Announce Type: new Abstract: This paper presents a learning-based approach for all-pairs motion planning, where the initial and goal states are allowed to be arbitrary points in a safe set. We construct smooth goal-conditioned neural ordinary differential equations (neural ODEs) via bi-Lipschitz diffeomorphisms. Theoretical results show that the proposed model can provide guarantees of global exponential stability and safety (safe set forward invariance) regardless of goal location. Moreover, explicit bounds on convergence rate, tracking error, and vector field magnitude are established. Our approach admits a tractable learning implementation using bi-Lipschitz neural networks and can incorporate demonstration data. We illustrate the effectiveness of the proposed method

arXiv cs.RO

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 220 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

ProductsFresh

OpenAI Bought TBPN Because PR Can’t Keep Up With AI

OpenAI is deeply embedded in political and social discourse because AI itself is. A traditional way to shape public opinion, PR, is not effective enough to help OpenAI deal with the kinds of challenges the company faces. So OpenAI is buying a media company, TBPN. Read All

Hackernoon AI

1mabout 3 hours ago

ProductsFresh

How to Build a Voice Agent With AssemblyAI

This tutorial shows you how to build a complete voice agent that can have natural conversations with users. You'll create an application that listens to speech, processes it with AI, and responds back with voice—handling the full conversation loop in real-time. Read All

Hackernoon AI

1mabout 2 hours ago

ProductsLive

Empower Your Digital Strategy with Generative AI

Boost performance with AI solutions and Generative AI Solutions offered by a trusted IT consulting company focused on innovation and scalability.

Dev.to AI

1m27 minutes ago

ProductsLive

I Shipped an AI SaaS in 4 Hours. Here Is the Exact Stack.

I Shipped an AI SaaS in 4 Hours. Here Is the Exact Stack. Every AI SaaS project starts the same way. You have a great idea. You open your editor. Then you spend three weeks on auth, Stripe integration, a dashboard, and a landing page — none of which is your actual product. I built a kit that eliminates that. Here is the exact stack and what each piece does. The Stack next.js 14 (App Router) tailwind css stripe billing nextauth openai / claude api routes prisma + postgresql What Comes Pre-Wired Authentication (NextAuth) // app/api/auth/[...nextauth]/route.ts import NextAuth from " next-auth " import { authOptions } from " @/lib/auth " const handler = NextAuth ( authOptions ) export { handler as GET , handler as POST } Google OAuth, GitHub OAuth, and email/password — all configured. Sessions

Dev.to AI

4m21 minutes ago