Research Papers research paper arxiv machine-learning deep-learning

Speech-Synchronized Whiteboard Generation via VLM-Driven Structured Drawing Representations

arXivMarch 30, 202610 min read0 views

arXiv:2603.25870v1 Announce Type: cross Abstract: Creating whiteboard-style educational videos demands precise coordination between freehand illustrations and spoken narration, yet no existing method addresses this multimodal synchronization problem with structured, reproducible drawing representations. We present the first dataset of 24 paired Excalidraw demonstrations with narrated audio, where every drawing element carries millisecond-precision creation timestamps spanning 8 STEM domains. Using this data, we study whether a vision-language model (Qwen2-VL-7B), fine-tuned via LoRA, can predi — Suraj Prasad, Pinak Mahapatra

View PDF HTML (experimental)

Abstract:Creating whiteboard-style educational videos demands precise coordination between freehand illustrations and spoken narration, yet no existing method addresses this multimodal synchronization problem with structured, reproducible drawing representations. We present the first dataset of 24 paired Excalidraw demonstrations with narrated audio, where every drawing element carries millisecond-precision creation timestamps spanning 8 STEM domains. Using this data, we study whether a vision-language model (Qwen2-VL-7B), fine-tuned via LoRA, can predict full stroke sequences synchronized to speech from only 24 demonstrations. Our topic-stratified five-fold evaluation reveals that timestamp conditioning significantly improves temporal alignment over ablated baselines, while the model generalizes across unseen STEM topics. We discuss transferability to real classroom settings and release our dataset and code to support future research in automated educational content generation.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cite as: arXiv:2603.25870 [cs.CV]

(or arXiv:2603.25870v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.25870

arXiv-issued DOI via DataCite

Submission history

From: Suraj Prasad [view email] [v1] Thu, 26 Mar 2026 19:56:56 UTC (23 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25870

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

Researchers Map Mycorrhizal Fungi Carbon Hotspots - Let's Data Science

Researchers Map Mycorrhizal Fungi Carbon Hotspots Let's Data Science

Google News: Machine Learning

1mabout 9 hours ago

ModelsFresh

[D] AI research on small language models

i'm doing research on some trending fields in AI, currently working on small language models and would love to meet people who are working in similar domains and are looking to write/publish papers! submitted by /u/StoicWithSyrup [link] [comments]

Reddit r/MachineLearning

1mabout 2 hours ago

CountriesFresh

Promising Signals on AI Governance from China

View the official memo here. China has consistently signaled a willingness to engage on global AI governance since at least 2017. This memo compiles key statements from the Chinese government and prominent figures demonstrating their desire to coordinate on the problem of AI. Chinese Vice Premier Ding Xuexiang, at the 2025 World Economic Forum, said: [ ] The post Promising Signals on AI Governance from China appeared first on Machine Intelligence Research Institute .

intelligence.org

1mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Researchers Map Mycorrhizal Fungi Carbon Hotspots - Let's Data Science

Researchers Map Mycorrhizal Fungi Carbon Hotspots Let's Data Science

Google News: Machine Learning

1mabout 9 hours ago

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1mabout 1 month ago

Research Papers

AI Journey 2025 Conference: exploring the future of artificial intelligence - Азия-Плюс

AI Journey 2025 Conference: exploring the future of artificial intelligence Азия-Плюс

Google News - AI Tajikistan

1m5 months ago

Research Papers

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

Vision Language Models struggle with fine-grained visual perception tasks due to their language-centric training approach, performing poorly on unnamed visual entities despite having relevant information in their representations. (1 upvotes on HuggingFace)

HuggingFace Papers

3m5 days ago