Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessLess than a month: StrictlyVC San Francisco brings leaders from TDK Ventures, Replit, and more togetherTechCrunch AIA YouTuber channeled his distaste for the PS5’s design into slick console coversThe Verge AIThe end of 'shadow AI' at enterprises? Kilo launches KiloClaw for Organizations to enable secure AI agents at scaleVentureBeat AI"You Have Not Been a Good User" (LessWrong's second album)LessWrong AIWhy Cyber-Insurance and SOC 2 Audits Struggle with Small Tech Teams — And What a Structured Evidence Layer ChangesDEV CommunityA Code Authorship Analysis on the Claude Code Leak. What Was Found Doesn't Match Human or AI Code.DEV CommunityVanityH – Elegant Hyperscript DSL for Frontend Render FunctionsDEV Community“Prismo: Building an AI-Powered Parametric Insurance for Gig Workers | Hackathon Journey”DEV CommunityFrom Coin Toss to LLM — Understanding Random VariablesDEV Community7 Patterns That Stop Your AI Agent From Going Rogue in ProductionDEV CommunityI Let an AI Agent Run My Freelance Life. It Almost Burned It Down.DEV CommunityHow to Build an AI Agent That Tweets for You (Step by Step)DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessLess than a month: StrictlyVC San Francisco brings leaders from TDK Ventures, Replit, and more togetherTechCrunch AIA YouTuber channeled his distaste for the PS5’s design into slick console coversThe Verge AIThe end of 'shadow AI' at enterprises? Kilo launches KiloClaw for Organizations to enable secure AI agents at scaleVentureBeat AI"You Have Not Been a Good User" (LessWrong's second album)LessWrong AIWhy Cyber-Insurance and SOC 2 Audits Struggle with Small Tech Teams — And What a Structured Evidence Layer ChangesDEV CommunityA Code Authorship Analysis on the Claude Code Leak. What Was Found Doesn't Match Human or AI Code.DEV CommunityVanityH – Elegant Hyperscript DSL for Frontend Render FunctionsDEV Community“Prismo: Building an AI-Powered Parametric Insurance for Gig Workers | Hackathon Journey”DEV CommunityFrom Coin Toss to LLM — Understanding Random VariablesDEV Community7 Patterns That Stop Your AI Agent From Going Rogue in ProductionDEV CommunityI Let an AI Agent Run My Freelance Life. It Almost Burned It Down.DEV CommunityHow to Build an AI Agent That Tweets for You (Step by Step)DEV Community

FlashSign: Pose-Free Guidance for Efficient Sign Language Video Generation

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.27915v1 Announce Type: new Abstract: Sign language plays a crucial role in bridging communication gaps between the deaf and hard-of-hearing communities. However, existing sign language video generation models often rely on complex intermediate representations, which limits their flexibility and efficiency. In this work, we propose a novel pose-free framework for real-time sign language video generation. Our method eliminates the need for intermediate pose representations by directly mapping natural language text to sign language videos using a diffusion-based approach. We introduce — Liuzhou Zhang, Zeyu Zhang, Biao Wu, Luyao Tang, Zirui Song, Hongyang He, Renda Han, Guangzhen Yao, Huacan Wang, Ronghao Chen, Xiuying Chen, Guan Huang, Zheng Zhu

Authors:Liuzhou Zhang, Zeyu Zhang, Biao Wu, Luyao Tang, Zirui Song, Hongyang He, Renda Han, Guangzhen Yao, Huacan Wang, Ronghao Chen, Xiuying Chen, Guan Huang, Zheng Zhu

View PDF HTML (experimental)

Abstract:Sign language plays a crucial role in bridging communication gaps between the deaf and hard-of-hearing communities. However, existing sign language video generation models often rely on complex intermediate representations, which limits their flexibility and efficiency. In this work, we propose a novel pose-free framework for real-time sign language video generation. Our method eliminates the need for intermediate pose representations by directly mapping natural language text to sign language videos using a diffusion-based approach. We introduce two key innovations: (1) a pose-free generative model based on the a state-of-the-art diffusion backbone, which learns implicit text-to-gesture alignments without pose estimation, and (2) a Trainable Sliding Tile Attention (T-STA) mechanism that accelerates inference by exploiting spatio-temporal locality patterns. Unlike previous training-free sparsity approaches, T-STA integrates trainable sparsity into both training and inference, ensuring consistency and eliminating the train-test gap. This approach significantly reduces computational overhead while maintaining high generation quality, making real-time deployment feasible. Our method increases video generation speed by 3.07x without compromising video quality. Our contributions open new avenues for real-time, high-quality, pose-free sign language synthesis, with potential applications in inclusive communication tools for diverse communities. Code: this https URL.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27915 [cs.CV]

(or arXiv:2603.27915v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27915

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zeyu Zhang [view email] [v1] Mon, 30 Mar 2026 00:06:26 UTC (400 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FlashSign: …researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers