Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechOpenClaw has 500,000 instances and no enterprise kill switchVentureBeat AIJump to play: Building with Gemini & MediaPipeGoogle Developers BlogADK Go 1.0 Arrives!Google Developers BlogAnnouncing ADK for Java 1.0.0: Building the Future of AI Agents in JavaGoogle Developers BlogPlan mode is now available in Gemini CLIGoogle Developers BlogUnleash Your Development Superpowers: Refining the Core Coding ExperienceGoogle Developers BlogClosing the knowledge gap with agent skillsGoogle Developers BlogBuild a smart financial assistant with LlamaParse and Gemini 3.1Google Developers BlogDeveloper’s Guide to AI Agent ProtocolsGoogle Developers BlogAnnouncing the Colab MCP Server: Connect Any AI Agent to Google ColabGoogle Developers BlogIntroducing Finish Changes and Outlines, now available in Gemini Code Assist extensions on IntelliJ and VS CodeGoogle Developers BlogBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechOpenClaw has 500,000 instances and no enterprise kill switchVentureBeat AIJump to play: Building with Gemini & MediaPipeGoogle Developers BlogADK Go 1.0 Arrives!Google Developers BlogAnnouncing ADK for Java 1.0.0: Building the Future of AI Agents in JavaGoogle Developers BlogPlan mode is now available in Gemini CLIGoogle Developers BlogUnleash Your Development Superpowers: Refining the Core Coding ExperienceGoogle Developers BlogClosing the knowledge gap with agent skillsGoogle Developers BlogBuild a smart financial assistant with LlamaParse and Gemini 3.1Google Developers BlogDeveloper’s Guide to AI Agent ProtocolsGoogle Developers BlogAnnouncing the Colab MCP Server: Connect Any AI Agent to Google ColabGoogle Developers BlogIntroducing Finish Changes and Outlines, now available in Gemini Code Assist extensions on IntelliJ and VS CodeGoogle Developers Blog

MIBURI: Towards Expressive Interactive Gesture Synthesis

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.03282v2 Announce Type: replace Abstract: Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large language model (LLM)-based conversational agents lack embodiment and the expressive gestures essential for natural interaction. Existing solutions for ECAs often produce rigid, low-diversity motions, that are unsuitable for human-like interaction. Alternatively, generative methods for co-speech gesture synthesis yield natural body gestures but depend on future speech context and require long run-t — M. Hamza Mughal, Rishabh Dabral, Vera Demberg, Christian Theobalt

View PDF HTML (experimental)

Abstract:Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large language model (LLM)-based conversational agents lack embodiment and the expressive gestures essential for natural interaction. Existing solutions for ECAs often produce rigid, low-diversity motions, that are unsuitable for human-like interaction. Alternatively, generative methods for co-speech gesture synthesis yield natural body gestures but depend on future speech context and require long run-times. To bridge this gap, we present MIBURI, the first online, causal framework for generating expressive full-body gestures and facial expressions synchronized with real-time spoken dialogue. We employ body-part aware gesture codecs that encode hierarchical motion details into multi-level discrete tokens. These tokens are then autoregressively generated by a two-dimensional causal framework conditioned on LLM-based speech-text embeddings, modeling both temporal dynamics and part-level motion hierarchy in real time. Further, we introduce auxiliary objectives to encourage expressive and diverse gestures while preventing convergence to static poses. Comparative evaluations demonstrate that our causal and real-time approach produces natural and contextually aligned gestures against recent baselines. We urge the reader to explore demo videos on this https URL.

Comments: CVPR 2026 (Main). Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC)

Cite as: arXiv:2603.03282 [cs.CV]

(or arXiv:2603.03282v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.03282

arXiv-issued DOI via DataCite

Submission history

From: Muhammad Hamza Mughal [view email] [v1] Tue, 3 Mar 2026 18:59:51 UTC (1,802 KB) [v2] Fri, 27 Mar 2026 00:52:15 UTC (1,802 KB)

Original source

arXiv

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MIBURI: Tow…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 120 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers