Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAfter a 23% Plunge in the First Quarter, Can Microsoft’s AI Story Continue? - NAI500GNews AI MicrosoftAI Video Generation Startup Runway Unveils $10 Mn VC Fund To Back Early-stage AI Startups: Report - bwdisrupt.comGNews AI startupsStudy: AI Data Centers Raise Local Temperatures by 2-9 Degrees Celsius - Tempo.co EnglishGoogle News - Scale AI dataOracle layoffs: 12,000 jobs cut in India amid AI push, more layoffs likely - Storyboard18GNews AI IndiaIs Arista Networks (ANET) Becoming NVIDIA’s Go-To AI Network Spine or Just One Key Partner? - simplywall.stGNews AI NVIDIAZhipu's Stock Soars After Chinese AI Startup's Annual Revenue More Than Doubles - Yicai GlobalGNews AI ChinaAustralia signs AI MoU with Anthropic, flags data centre investment - W.MediaGNews AI AustraliaHong Kong hasn’t issued a single HKD stablecoin license after March targetCoinDesk AIBitcoin is closer to its 'buy zone' than it's been in three yearsCoinDesk AIRAG Web Browser: Give Your AI Real-Time Web Access Without HallucinationsDEV CommunityWhat Nobody Tells You About Building a Protocol for AI AgentsDEV CommunityHuawei highlights AI, HarmonyOS and auto momentum in 2025 annual report - TechNodeGNews AI HuaweiBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAfter a 23% Plunge in the First Quarter, Can Microsoft’s AI Story Continue? - NAI500GNews AI MicrosoftAI Video Generation Startup Runway Unveils $10 Mn VC Fund To Back Early-stage AI Startups: Report - bwdisrupt.comGNews AI startupsStudy: AI Data Centers Raise Local Temperatures by 2-9 Degrees Celsius - Tempo.co EnglishGoogle News - Scale AI dataOracle layoffs: 12,000 jobs cut in India amid AI push, more layoffs likely - Storyboard18GNews AI IndiaIs Arista Networks (ANET) Becoming NVIDIA’s Go-To AI Network Spine or Just One Key Partner? - simplywall.stGNews AI NVIDIAZhipu's Stock Soars After Chinese AI Startup's Annual Revenue More Than Doubles - Yicai GlobalGNews AI ChinaAustralia signs AI MoU with Anthropic, flags data centre investment - W.MediaGNews AI AustraliaHong Kong hasn’t issued a single HKD stablecoin license after March targetCoinDesk AIBitcoin is closer to its 'buy zone' than it's been in three yearsCoinDesk AIRAG Web Browser: Give Your AI Real-Time Web Access Without HallucinationsDEV CommunityWhat Nobody Tells You About Building a Protocol for AI AgentsDEV CommunityHuawei highlights AI, HarmonyOS and auto momentum in 2025 annual report - TechNodeGNews AI Huawei

HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26553v1 Announce Type: new Abstract: While the field of co-speech gesture generation has seen significant advances, producing holistic, semantically grounded gestures remains a challenge. Existing approaches rely on external semantic retrieval methods, which limit their generalisation capability due to dependency on predefined linguistic rules. Flow-matching-based methods produce promising results; however, the network is optimised using only semantically congruent samples without exposure to negative examples, leading to learning rhythmic gestures rather than sparse motion, such as — Lanmiao Liu, Esam Ghaleb, Asl{\i} \"Ozy\"urek, Zerrin Yumak

View PDF HTML (experimental)

Abstract:While the field of co-speech gesture generation has seen significant advances, producing holistic, semantically grounded gestures remains a challenge. Existing approaches rely on external semantic retrieval methods, which limit their generalisation capability due to dependency on predefined linguistic rules. Flow-matching-based methods produce promising results; however, the network is optimised using only semantically congruent samples without exposure to negative examples, leading to learning rhythmic gestures rather than sparse motion, such as iconic and metaphoric gestures. Furthermore, by modelling body parts in isolation, the majority of methods fail to maintain crossmodal consistency. We introduce a Contrastive Flow Matching-based co-speech gesture generation model that uses mismatched audio-text conditions as negatives, training the velocity field to follow the correct motion trajectory while repelling semantically incongruent trajectories. Our model ensures cross-modal coherence by embedding text, audio, and holistic motion into a composite latent space via cosine and contrastive objectives. Extensive experiments and a user study demonstrate that our proposed approach outperforms state-of-the-art methods on two datasets, BEAT2 and SHOW.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26553 [cs.CV]

(or arXiv:2603.26553v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26553

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Lanmiao Liu [view email] [v1] Fri, 27 Mar 2026 16:11:44 UTC (9,129 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
HolisticSem…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 105 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers