Research Papers research paper arxiv computer-vision image-recognition

HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching

arXivMarch 30, 202610 min read0 views

arXiv:2603.26553v1 Announce Type: new Abstract: While the field of co-speech gesture generation has seen significant advances, producing holistic, semantically grounded gestures remains a challenge. Existing approaches rely on external semantic retrieval methods, which limit their generalisation capability due to dependency on predefined linguistic rules. Flow-matching-based methods produce promising results; however, the network is optimised using only semantically congruent samples without exposure to negative examples, leading to learning rhythmic gestures rather than sparse motion, such as — Lanmiao Liu, Esam Ghaleb, Asl{\i} \"Ozy\"urek, Zerrin Yumak

View PDF HTML (experimental)

Abstract:While the field of co-speech gesture generation has seen significant advances, producing holistic, semantically grounded gestures remains a challenge. Existing approaches rely on external semantic retrieval methods, which limit their generalisation capability due to dependency on predefined linguistic rules. Flow-matching-based methods produce promising results; however, the network is optimised using only semantically congruent samples without exposure to negative examples, leading to learning rhythmic gestures rather than sparse motion, such as iconic and metaphoric gestures. Furthermore, by modelling body parts in isolation, the majority of methods fail to maintain crossmodal consistency. We introduce a Contrastive Flow Matching-based co-speech gesture generation model that uses mismatched audio-text conditions as negatives, training the velocity field to follow the correct motion trajectory while repelling semantically incongruent trajectories. Our model ensures cross-modal coherence by embedding text, audio, and holistic motion into a composite latent space via cosine and contrastive objectives. Extensive experiments and a user study demonstrate that our proposed approach outperforms state-of-the-art methods on two datasets, BEAT2 and SHOW.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26553 [cs.CV]

(or arXiv:2603.26553v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26553

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Lanmiao Liu [view email] [v1] Fri, 27 Mar 2026 16:11:44 UTC (9,129 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26553

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Models

Understanding Large Language Models -- A Transformative Reading List

Since transformers have such a big impact on everyone's research agenda, I wanted to flesh out a short reading list for machine learning researchers and...

Sebastian Raschka

1mabout 3 years ago

Research Papers

Humboldt Fellow from the US conducts research in robotics to one day harvest energy from ocean waves

is.mpg.de

1m8 months ago

Models

Howard University and Google Research Enhance A.I. Speech Recognition of African American English - The Dig at Howard University

<a href="https://news.google.com/rss/articles/CBMiygFBVV95cUxQRTh4T2h6cVRsdEF2cjlkWGQyT2tWZnVTTmh4czBJV3ZpSmd1T1Z2eG5Ld1dvQWhNckpjRDItVEtiZ2hMdjBVLWJ0b0xTY0pieG82U0VibXFBLWVUN0tlQ3J1dzBFa2ZBekF1YXJPZlpHNGtkOWZjdWFCSlVTQTctcTNvcURtOER4MnhnYk1BQUt4WllmekE4WkVERTA4Wi1VcnFCY2xYSml6ak9GM1o1NmI0VWtXb2xERlVZVFNBTTQyQ1FBWThESk53?oc=5" target="_blank">Howard University and Google Research Enhance A.I. Speech Recognition of African American English</a> The Dig at Howard University

GNews AI voice

1m9 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 105 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Humboldt Fellow from the US conducts research in robotics to one day harvest energy from ocean waves

is.mpg.de

1m8 months ago

Research Papers

AI-driven digital manipulation ‘tested’ Dutch election integrity, researchers warn - EUobserver

<a href="https://news.google.com/rss/articles/CBMirwFBVV95cUxQcERTcUc5ZndxZ054endXTXNwTlhtYjRyLXBHWVJmRXloNV9JUUpFZnBrLUdDeUpSNklZRFJuUXl0bThIT2ZzbFd6ZU02TW9yaXBPbHducUlHaXVUbWprS0pla0JENkxpSkZfWW9vdTRvcjIzc2ZzWGF6ZmJPMXRVRkFnNmp5NWpLZTBIRk9LamF2RUtkdnQ2bFJXRVZMdVkxZWNHVUl1SzZZeE1JT3R3?oc=5" target="_blank">AI-driven digital manipulation ‘tested’ Dutch election integrity, researchers warn</a> EUobserver

GNews AI Netherlands

1m2 months ago

Research PapersLive

Why Drug Toxicity Can’t Be Predicted in Isolation — Building EIRION with Graph Neural Networks

How we built a graph neural network that finally sees the whole play — not just the audition Every year, drugs that passed early safety tests go on to harm people in ways nobody predicted. Not because the chemistry was wrong. Not because the researchers were careless. But because we kept evaluating drugs the way a talent agent judges an actor from a solo audition tape. Isolated. Out of context. No script. No co-stars. No stage. In real theatre, a performance is never just about one actor. It depends on who they share the stage with, which scene they appear in, what the story demands at that moment. A brilliant performer in the wrong play, surrounded by the wrong cast, in the wrong context — can still wreck the whole production. That is exactly how drug toxicity works. And that is exactly t

Towards AI

17mabout 1 hour ago

Research PapersLive

It's Not Smarter Models — It's Cheaper Memory: TurboQuant's Real Impact, Wall Street Panic & Academic Storm

<blockquote> One-line summary: TurboQuant is a genuinely important engineering breakthrough — but Google's marketing, academic ethics controversy, and Wall Street's overreaction made the story far more dramatic than the technology itself. </blockquote> <h2> 0. What This Article Answers </h2> Google Research published TurboQuant at ICLR 2026 (<a href="https://arxiv.org/abs/2504.19874" rel="noopener noreferrer">arXiv 2504.19874</a>), claiming 6x memory compression, 8x speedup, and zero accuracy loss for LLM KV caches. Then, in the same week: <ol> <li>Global memory stocks lost over $90 billion in market cap</li> <li>An ETH Zürich researcher publicly accused the paper of academic plagiarism and experimental fraud </li> <li

DEV Community

10mabout 1 hour ago