DyaDiT: A Multi-Modal Diffusion Transformer for Socially Favorable Dyadic Gesture Generation
arXiv:2602.23165v2 Announce Type: replace Abstract: Generating realistic conversational gestures are essential for achieving natural, socially engaging interactions with digital humans. However, existing methods typically map a single audio stream to a single speaker's motion, without considering social context or modeling the mutual dynamics between two people engaging in conversation. We present DyaDiT, a multi-modal diffusion transformer that generates contextually appropriate human motion from dyadic audio signals. Trained on Seamless Interaction Dataset, DyaDiT takes dyadic audio with opt — Yichen Peng, Jyun-Ting Song, Siyeol Jung, Ruofan Liu, Haiyang Liu, Xuangeng Chu, Ruicong Liu, Erwin Wu, Hideki Koike, Kris Kitani
View PDF HTML (experimental)
Abstract:Generating realistic conversational gestures are essential for achieving natural, socially engaging interactions with digital humans. However, existing methods typically map a single audio stream to a single speaker's motion, without considering social context or modeling the mutual dynamics between two people engaging in conversation. We present DyaDiT, a multi-modal diffusion transformer that generates contextually appropriate human motion from dyadic audio signals. Trained on Seamless Interaction Dataset, DyaDiT takes dyadic audio with optional social-context tokens to produce context-appropriate motion. It fuses information from both speakers to capture interaction dynamics, uses a motion dictionary to encode motion priors, and can optionally utilize the conversational partner's gestures to produce more responsive motion. We evaluate DyaDiT on standard motion generation metrics and conduct quantitative user studies, demonstrating that it not only surpasses existing methods on objective metrics but is also strongly preferred by users, highlighting its robustness and socially favorable motion generation. Code and models will be released upon acceptance.
Comments: 13 pages, 9 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2602.23165 [cs.CV]
(or arXiv:2602.23165v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2602.23165
arXiv-issued DOI via DataCite
Submission history
From: Yichen Peng [view email] [v1] Thu, 26 Feb 2026 16:30:07 UTC (2,349 KB) [v2] Mon, 30 Mar 2026 06:15:00 UTC (2,333 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivWe Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak - Finextra Research
<a href="https://news.google.com/rss/articles/CBMitAFBVV95cUxQSkNxZGExOG5KR1piVXBnRTN0dkxmak84akUyc0QteDdvSFlXZVNRZzktUjRyYVNvLWlKUVI5Ulp1M0hPY3g0RU9yNmowd0xmWDBIMmxCVkVDTkVjMXRscXFaV1lGTWVXajRycklSWnA4end2NDRkckM3ZE1VenZ6ZVluMmh4LXVqWXVzMEZGY2hyMXBpdnBYYldHTzVfZ2JxT3JCYmExOFphQUlTRER6bl9waWY?oc=5" target="_blank">We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak</a> <font color="#6f6f6f">Finextra Research</font>
Stanford Researchers Find Thin Evidence Behind AI Classroom Tools - GovTech
<a href="https://news.google.com/rss/articles/CBMipwFBVV95cUxQYmVMLUpxaHV6R1RPY1R0WGtNLTVrQXlWTzUySzJRamxoWEdqYlptMW1lMjNWMWRuS1hhb2pVNjhpdWRxekRfclhVbl9FT3E0U1Byc18xcWd0Wm5XM1BTUlNRRWNpaFlzNVk4SDN3eW9YRkFWNlJsVXhIUWdnWmdxX3ZJQUUtcm5MSFRxNTRlZ0I1cXdnV2xHUGdRT0NaQ015Z3czV3J2Yw?oc=5" target="_blank">Stanford Researchers Find Thin Evidence Behind AI Classroom Tools</a> <font color="#6f6f6f">GovTech</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak - Finextra Research
<a href="https://news.google.com/rss/articles/CBMitAFBVV95cUxQSkNxZGExOG5KR1piVXBnRTN0dkxmak84akUyc0QteDdvSFlXZVNRZzktUjRyYVNvLWlKUVI5Ulp1M0hPY3g0RU9yNmowd0xmWDBIMmxCVkVDTkVjMXRscXFaV1lGTWVXajRycklSWnA4end2NDRkckM3ZE1VenZ6ZVluMmh4LXVqWXVzMEZGY2hyMXBpdnBYYldHTzVfZ2JxT3JCYmExOFphQUlTRER6bl9waWY?oc=5" target="_blank">We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak</a> <font color="#6f6f6f">Finextra Research</font>
Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn - The Independent
<a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxQcVVnRFpzdEtnNVFmdll6VlViUUc5aUhkSzR4Wi1zOVNOMFo2TGtBcjZLR1ZnNVdmYUlPcDNrNW9oT3YzUFFSYlJjLUlLUmtQT1pWQzFxVWRnSXZjelJpaXoxTURrZGw0OFVMc2U5SGhyOVpEMnlnVmhrQ3R6VF9teFNPLTJ0c3JaNGJJeHRaR3ZmOGRFd0FMLVQ2ZHpTMm42NGc?oc=5" target="_blank">Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn</a> <font color="#6f6f6f">The Independent</font>



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!