SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion
arXiv:2510.13044v2 Announce Type: replace-cross Abstract: Human motion is inherently diverse and semantically rich, while also shaped by the surrounding scene. However, existing motion generation approaches fail to generate semantically diverse motion while simultaneously respecting geometric scene constraints, since constructing large-scale datasets with both rich text-motion coverage and precise scene interactions is extremely challenging. In this work, we introduce SceneAdapt, a two-stage adaptation framework that enables semantically diverse, scene-aware human motion generation from text w — Jungbin Cho, Minsu Kim, Jisoo Kim, Ce Zheng, Laszlo A. Jeni, Ming-Hsuan Yang, Youngjae Yu, Seonjoo Kim
View PDF HTML (experimental)
Abstract:Human motion is inherently diverse and semantically rich, while also shaped by the surrounding scene. However, existing motion generation approaches fail to generate semantically diverse motion while simultaneously respecting geometric scene constraints, since constructing large-scale datasets with both rich text-motion coverage and precise scene interactions is extremely challenging. In this work, we introduce SceneAdapt, a two-stage adaptation framework that enables semantically diverse, scene-aware human motion generation from text without large-scale paired text--scene--motion data. Our key idea is to use motion inbetweening, a learnable proxy task that requires no text, as a bridge between two disjoint resources: a text-motion dataset and a scene-motion dataset. By first adapting a text-to-motion model through inbetweening and then through scene-aware inbetweening, SceneAdapt injects geometric scene constraints into text-conditioned generation while preserving semantic diversity. To enable adaptation for inbetweening, we propose a novel Context-aware Keyframing (CaKey) layer that modulates motion latents for keyframe-conditioned synthesis while preserving the original latent manifold. To further adapt the model for scene-aware inbetweening, we introduce a Scene-conditioning (SceneCo) layer that injects geometric scene information by adaptively querying local context via cross-attention. Experimental results show that SceneAdapt effectively injects scene-awareness into text-to-motion models without sacrificing semantic diversity, and we further analyze the mechanisms through which this awareness emerges. Code and models will be released. Project page: \href{this https URL}{this http URL}
Comments: 15 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2510.13044 [cs.CV]
(or arXiv:2510.13044v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2510.13044
arXiv-issued DOI via DataCite
Submission history
From: Jungbin Cho [view email] [v1] Tue, 14 Oct 2025 23:42:10 UTC (3,917 KB) [v2] Mon, 30 Mar 2026 08:00:05 UTC (3,398 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivExclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPZ2pNbEQyT1dhaDJRWllyZHVUQnBJd0d4WGZnMTg3RnRWYUpvOHJYOGNLMUc5NTdFU1J2dFJrdW5UejdtSF9zeXlVa1l3V09ValkxS1BwdlhzR2ZKLUR2QktrdDhiNlh1RVZxTjI3aVVpWVpJWWI0NjN2Q3d0ekdrS2YtVmc4MEN6ZjRQN3BWUTU0ZzJpT0Y1N01GN1UyT1ROeDJCb0gxR2xNYkNBZ0dHazdmeXlCQ2p0Tk8zR3RyM0lHVmc4QlRLVDRGeFptNXJ2WGR0bHR0QlJIb2psZjBsNzhhSnZaOFVqMnhQVUFoRzltLTFlMUdVQWl5WUJRX3NQSW1yOW1pTFpURkEzd2otMHFxRmtyNDEyZ2NTOVBkVHZCcGh1aEpURjFQQUNrNFBQX3ozUk4yV2xCejQ5RHY0elNibEtXSEhBZ1NDVWhRQzFieXNrMjRxb085RUtSY2pleHhCZ2UyWU1SdVZZcFo5U0JES01yQmtuUzFySWl3MW9iako4X3FYWXFuUGN0SUc2MXJUWUx6OE8zbW1BMm5YNXZSYTduUHNPazZ2QlgwZlNBdFNEX2RKWA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Chat, is this sus?
A large assumption we have made in AI control is that humans will be perfect at auditing , that is, being shown a transcript and determining if the AI was scheming in that transcript. But we are uncertain whether humans will be perfect at auditing; they are prone to fatigue and distraction. That is why I’m releasing "Sentinel" today, an extremely high-stimulation way to audit boring transcripts. Sentinel is a revolutionary way to get more juice out of your human auditors by gamifying the auditing process with a level system, perks, power-ups, and more fun features. Try it now here . In AI control literature, we love finding the safety/usefulness trade-offs of everything we create, but surprisingly, we noticed no trade-offs with this product The rest of the post will go over some of the way
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPcEJxOGx1QkFiZFVwWEl2b3F1dXBPU0wyTnNUOTFNUGlUZkViMnI5cndsZFRnMGMtOXJkc2Z1T2tqQTBwZlJfUXViTl9tVTZuTHU3cGxIX0RISGNUTElLT2JOTnpxU0trbC1DYktERDU3cDVHUDJjdFFkQkc0dUNfeDZ6bW51cmc1U3YwVlJGNFV1QjdoNmFoaHFHTzQ4T1RBN3duaFp4ajFnTW1SNUZ2Y0JwQ0xGMUFfZ0dqYk1Ob0kxc1h5WTdLY3NsNnNOUmhwaHI3R2dGb0lrTm9FX1E3ZEktU2h5Q3Q4MDNpbWVsRXJSUzh1MlB0Y2dfQWhkT0NoYzVxT0J5ek5YVVR2cTBjN1EyelFLdE9YMTMzTlZZaVU0N1otQzlvOF9CdXo2WVAyM0J0Vjk1bU9OYU5fZGhkSXF1c25nWjhVTjJIQW5rcUk2QUE2WFpUeklONTlDQVh1dDBzME0zSE1GaXBsanczQkRFU0ZjSzVTMThRMThjaHk1c1lMREhrOURDdzVpRWZDMVlrRHhkNkpKTzVzTWdHeXg1ZGdkYzBrVTVsOFZXNEZZU1dNdUxQZQ?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno
<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> <font color="#6f6f6f">University of Nevada, Reno</font>
AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> <font color="#6f6f6f">Forbes</font>
Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st
<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> <font color="#6f6f6f">simplywall.st</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!