Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessStop Chatting with Large Language Models: A Product Manager's Guide to Reconstructing AI Workflows - 36 KrGoogle News: LLMOpenAI’s Never-Ending Soap Opera - The InformationGoogle News: OpenAITennibot launches Partner V2, its latest robotic tennis ball machineThe Robot ReportI fear Anthropic, OpenAI, and SpaceX IPOs will suck capital out of the market, says Jim Cramer - CNBCGoogle News: OpenAIAn Implementation Guide to Running NVIDIA Transformer Engine with Mixed Precision, FP8 Checks, Benchmarking, and Fallback ExecutionMarkTechPostBusinesses scramble to get noticed by AI searchBBC TechnologyOpenAI is getting weird again - PlatformerGoogle News: OpenAI[D] How's MLX and jax/ pytorch on MacBooks these days?Reddit r/MachineLearningWhich Artificial Intelligence (AI) Supercycle Stock Will Make You Richer Over the Next 10 Years? - The Motley FoolGoogle News: AIOpenAI policy blueprint sparks AI regulation debate - Fox BusinessGNews AI regulationAnthropic Claude AI training model targets AI skills gap | ETIH EdTech News - EdTech Innovation HubGoogle News: ClaudeSamsung flags eightfold jump in Q1 profit as AI chip demand drives up prices - ReutersGNews AI SamsungBlack Hat USADark ReadingBlack Hat AsiaAI BusinessStop Chatting with Large Language Models: A Product Manager's Guide to Reconstructing AI Workflows - 36 KrGoogle News: LLMOpenAI’s Never-Ending Soap Opera - The InformationGoogle News: OpenAITennibot launches Partner V2, its latest robotic tennis ball machineThe Robot ReportI fear Anthropic, OpenAI, and SpaceX IPOs will suck capital out of the market, says Jim Cramer - CNBCGoogle News: OpenAIAn Implementation Guide to Running NVIDIA Transformer Engine with Mixed Precision, FP8 Checks, Benchmarking, and Fallback ExecutionMarkTechPostBusinesses scramble to get noticed by AI searchBBC TechnologyOpenAI is getting weird again - PlatformerGoogle News: OpenAI[D] How's MLX and jax/ pytorch on MacBooks these days?Reddit r/MachineLearningWhich Artificial Intelligence (AI) Supercycle Stock Will Make You Richer Over the Next 10 Years? - The Motley FoolGoogle News: AIOpenAI policy blueprint sparks AI regulation debate - Fox BusinessGNews AI regulationAnthropic Claude AI training model targets AI skills gap | ETIH EdTech News - EdTech Innovation HubGoogle News: ClaudeSamsung flags eightfold jump in Q1 profit as AI chip demand drives up prices - ReutersGNews AI Samsung
AI NEWS HUBbyEIGENVECTOREigenvector

Speech-Synchronized Whiteboard Generation via VLM-Driven Structured Drawing Representations

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.25870v1 Announce Type: cross Abstract: Creating whiteboard-style educational videos demands precise coordination between freehand illustrations and spoken narration, yet no existing method addresses this multimodal synchronization problem with structured, reproducible drawing representations. We present the first dataset of 24 paired Excalidraw demonstrations with narrated audio, where every drawing element carries millisecond-precision creation timestamps spanning 8 STEM domains. Using this data, we study whether a vision-language model (Qwen2-VL-7B), fine-tuned via LoRA, can predi — Suraj Prasad, Pinak Mahapatra

View PDF HTML (experimental)

Abstract:Creating whiteboard-style educational videos demands precise coordination between freehand illustrations and spoken narration, yet no existing method addresses this multimodal synchronization problem with structured, reproducible drawing representations. We present the first dataset of 24 paired Excalidraw demonstrations with narrated audio, where every drawing element carries millisecond-precision creation timestamps spanning 8 STEM domains. Using this data, we study whether a vision-language model (Qwen2-VL-7B), fine-tuned via LoRA, can predict full stroke sequences synchronized to speech from only 24 demonstrations. Our topic-stratified five-fold evaluation reveals that timestamp conditioning significantly improves temporal alignment over ablated baselines, while the model generalizes across unseen STEM topics. We discuss transferability to real classroom settings and release our dataset and code to support future research in automated educational content generation.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cite as: arXiv:2603.25870 [cs.CV]

(or arXiv:2603.25870v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.25870

arXiv-issued DOI via DataCite

Submission history

From: Suraj Prasad [view email] [v1] Thu, 26 Mar 2026 19:56:56 UTC (23 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Speech-Sync…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!