Research Papers model announce available platform paper arxiv

Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

arXiv cs.ROby [Submitted on 2 Apr 2026]April 3, 20262 min read1 views

arXiv:2604.01756v1 Announce Type: new Abstract: Realistic lip synchronization is essential for the natural human-robot non-verbal interaction of humanoid robots. Motivated by this need, this paper presents a lip motion generation framework based on 3D dynamic viseme and coarticulation modeling. By analyzing Chinese pronunciation theory, a 3D dynamic viseme library is constructed based on the ARKit standard, which offers coherent prior trajectories of lips. To resolve motion conflicts within continuous speech streams, a coarticulation mechanism is developed by incorporating initial-final (Shengmu-Yunmu) decoupling and energy modulation. After developing a strategy to retarget high-dimensional spatial lip motion to a 14-DOF lip actuation system of a humanoid head platform, the efficiency and

View PDF HTML (experimental)

Abstract:Realistic lip synchronization is essential for the natural human-robot non-verbal interaction of humanoid robots. Motivated by this need, this paper presents a lip motion generation framework based on 3D dynamic viseme and coarticulation modeling. By analyzing Chinese pronunciation theory, a 3D dynamic viseme library is constructed based on the ARKit standard, which offers coherent prior trajectories of lips. To resolve motion conflicts within continuous speech streams, a coarticulation mechanism is developed by incorporating initial-final (Shengmu-Yunmu) decoupling and energy modulation. After developing a strategy to retarget high-dimensional spatial lip motion to a 14-DOF lip actuation system of a humanoid head platform, the efficiency and accuracy of the proposed architecture is experimentally validated and demonstrated with quantitative ablation experiments using the metrics of the Pearson Correlation Coefficient (PCC) and the Mean Absolute Jerk (MAJ). This research offers a lightweight, efficient, and highly practical paradigm for the speech-driven lip motion generation of humanoid robots. The 3D dynamic viseme library and real-world deployment videos are available at {this https URL}

Comments: 8 pages,7 figures

Subjects:

Robotics (cs.RO)

Cite as: arXiv:2604.01756 [cs.RO]

(or arXiv:2604.01756v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2604.01756

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Min Li [view email] [v1] Thu, 2 Apr 2026 08:24:49 UTC (2,942 KB)

Original source

arXiv cs.RO

https://arxiv.org/abs/2604.01756

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannounceavailable

ModelsRecent

Google's new open AI model, Gemma 4, gives developers more freedom - Dev.ua

Google's new open AI model, Gemma 4, gives developers more freedom Dev.ua

GNews AI Gemma

1mabout 15 hours ago

ModelsLive

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://github.com/cybergis/rs-embed submitted by /u/amritk110 [link] [comments]

Reddit r/MachineLearning

1mabout 1 hour ago

ModelsLive

Google introduces Gemma 4 open-source AI model - AzerNews

Google introduces Gemma 4 open-source AI model AzerNews

GNews AI Gemma

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 121 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

First time NeurIPS. How different is it from low-ranked conferences? [D]

I'm a PhD student and already published papers in A/B ranked paper (10+). My field of work never allowed me to work on something really exciting and a core A* conference. But finally after years I think I have work worthy of some discussion at the top venue. I'm referring to papers (my field and top papers) from previous editions and I notice that there's a big difference on how people write, how they put their message on table and also it is too theoretical sometimes. Are there any golden rules people follow who frequently get into these conferences? Should I be soft while making novelty claims? Also those who moved from submitting to niche-conferences to NeurIPS/ICML/CVPR, did you change your approach? My field is imaging in healthcare. submitted by /u/ade17_in [link] [comments]

Reddit r/MachineLearning

1mabout 1 hour ago

Research PapersLive

Researchers Discover How to Add Psilocybin, DMT, and Other Psychedelics to Tobacco

AI assisted with the study, which could make it cheaper and easier to produce these mind-bending drugs.

Gizmodo

3mabout 1 hour ago

Research Papers

Antonio Torralba, three MIT alumni named 2025 ACM fellows

Torralba’s research focuses on computer vision, machine learning, and human visual perception.

MIT AI News

2mabout 2 months ago

Research PapersFresh

[D] CVPR 2026 Travel Grant/Registration Waiver

Did anyone receive any communication from CVPR for waiving registration fees for students, some travel grant notification? submitted by /u/Healthy_Horse_2183 [link] [comments]

Reddit r/MachineLearning

1mabout 3 hours ago