Research Papers research paper arxiv computer-vision image-recognition

DyaDiT: A Multi-Modal Diffusion Transformer for Socially Favorable Dyadic Gesture Generation

arXivMarch 31, 20262 min read0 views

arXiv:2602.23165v2 Announce Type: replace Abstract: Generating realistic conversational gestures are essential for achieving natural, socially engaging interactions with digital humans. However, existing methods typically map a single audio stream to a single speaker's motion, without considering social context or modeling the mutual dynamics between two people engaging in conversation. We present DyaDiT, a multi-modal diffusion transformer that generates contextually appropriate human motion from dyadic audio signals. Trained on Seamless Interaction Dataset, DyaDiT takes dyadic audio with opt — Yichen Peng, Jyun-Ting Song, Siyeol Jung, Ruofan Liu, Haiyang Liu, Xuangeng Chu, Ruicong Liu, Erwin Wu, Hideki Koike, Kris Kitani

View PDF HTML (experimental)

Abstract:Generating realistic conversational gestures are essential for achieving natural, socially engaging interactions with digital humans. However, existing methods typically map a single audio stream to a single speaker's motion, without considering social context or modeling the mutual dynamics between two people engaging in conversation. We present DyaDiT, a multi-modal diffusion transformer that generates contextually appropriate human motion from dyadic audio signals. Trained on Seamless Interaction Dataset, DyaDiT takes dyadic audio with optional social-context tokens to produce context-appropriate motion. It fuses information from both speakers to capture interaction dynamics, uses a motion dictionary to encode motion priors, and can optionally utilize the conversational partner's gestures to produce more responsive motion. We evaluate DyaDiT on standard motion generation metrics and conduct quantitative user studies, demonstrating that it not only surpasses existing methods on objective metrics but is also strongly preferred by users, highlighting its robustness and socially favorable motion generation. Code and models will be released upon acceptance.

Comments: 13 pages, 9 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2602.23165 [cs.CV]

(or arXiv:2602.23165v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2602.23165

arXiv-issued DOI via DataCite

Submission history

From: Yichen Peng [view email] [v1] Thu, 26 Feb 2026 16:30:07 UTC (2,349 KB) [v2] Mon, 30 Mar 2026 06:15:00 UTC (2,333 KB)

Original source

arXiv

https://arxiv.org/abs/2602.23165

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak - Finextra Research

<a href="https://news.google.com/rss/articles/CBMitAFBVV95cUxQSkNxZGExOG5KR1piVXBnRTN0dkxmak84akUyc0QteDdvSFlXZVNRZzktUjRyYVNvLWlKUVI5Ulp1M0hPY3g0RU9yNmowd0xmWDBIMmxCVkVDTkVjMXRscXFaV1lGTWVXajRycklSWnA4end2NDRkckM3ZE1VenZ6ZVluMmh4LXVqWXVzMEZGY2hyMXBpdnBYYldHTzVfZ2JxT3JCYmExOFphQUlTRER6bl9waWY?oc=5" target="_blank">We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak</a> Finextra Research

GNews AI finance

1mabout 2 months ago

AI Tools

Stanford Researchers Find Thin Evidence Behind AI Classroom Tools - GovTech

<a href="https://news.google.com/rss/articles/CBMipwFBVV95cUxQYmVMLUpxaHV6R1RPY1R0WGtNLTVrQXlWTzUySzJRamxoWEdqYlptMW1lMjNWMWRuS1hhb2pVNjhpdWRxekRfclhVbl9FT3E0U1Byc18xcWd0Wm5XM1BTUlNRRWNpaFlzNVk4SDN3eW9YRkFWNlJsVXhIUWdnWmdxX3ZJQUUtcm5MSFRxNTRlZ0I1cXdnV2xHUGdRT0NaQ015Z3czV3J2Yw?oc=5" target="_blank">Stanford Researchers Find Thin Evidence Behind AI Classroom Tools</a> GovTech

GNews AI education

1m16 days ago

Research PapersLive

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

A new study from researchers at UC Berkeley and UC Santa Cruz suggests models will disobey human commands to protect their own kind.

Wired AI

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak - Finextra Research

GNews AI finance

1mabout 2 months ago

Research PapersLive

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

A new study from researchers at UC Berkeley and UC Santa Cruz suggests models will disobey human commands to protect their own kind.

Wired AI

1mabout 1 hour ago

Research PapersRecent

Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn - The Independent

<a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxQcVVnRFpzdEtnNVFmdll6VlViUUc5aUhkSzR4Wi1zOVNOMFo2TGtBcjZLR1ZnNVdmYUlPcDNrNW9oT3YzUFFSYlJjLUlLUmtQT1pWQzFxVWRnSXZjelJpaXoxTURrZGw0OFVMc2U5SGhyOVpEMnlnVmhrQ3R6VF9teFNPLTJ0c3JaNGJJeHRaR3ZmOGRFd0FMLVQ2ZHpTMm42NGc?oc=5" target="_blank">Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn</a> The Independent

GNews AI climate

1m1 day ago

Research PapersFresh

The Quantum Threat to Bitcoin Dividing Crypto

Two papers published this week have reignited debates about the risk posed by “Q-day” to the cryptography that underpins digital assets.

Decrypt AI

1mabout 3 hours ago