Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessComparing Today's Multi-Model DatabasesDEV CommunityBuilding a WeChat Mini Program Pre-Sale System from Scratch: A Builder's LogDEV Community26 Quizzes: What We've Learned About Which Results People Actually ShareDEV CommunityLayered Agentic Retrieval for Retail Floor Questions: A Solo PoCDEV CommunityHow to Handle Sensitive Data Securely in TerraformDEV CommunitySecure Cross-Platform File Sharing: A Unified Solution for Diverse Devices and NetworksDEV CommunityHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AII Tested a Real AI Agent for Security. The LLM Knew It Was Dangerous — But the Tool Layer Executed Anyway.DEV Community“Following the incentives”lesswrong.comI Got Tired of Surprise OpenAI Bills, So I Built a Dashboard to Track ThemDEV CommunitySynthetic Population Testing for Recommendation SystemsDEV CommunityI Analyzed 500 AI Coding Mistakes and Built an ESLint Plugin to Catch ThemDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessComparing Today's Multi-Model DatabasesDEV CommunityBuilding a WeChat Mini Program Pre-Sale System from Scratch: A Builder's LogDEV Community26 Quizzes: What We've Learned About Which Results People Actually ShareDEV CommunityLayered Agentic Retrieval for Retail Floor Questions: A Solo PoCDEV CommunityHow to Handle Sensitive Data Securely in TerraformDEV CommunitySecure Cross-Platform File Sharing: A Unified Solution for Diverse Devices and NetworksDEV CommunityHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AII Tested a Real AI Agent for Security. The LLM Knew It Was Dangerous — But the Tool Layer Executed Anyway.DEV Community“Following the incentives”lesswrong.comI Got Tired of Surprise OpenAI Bills, So I Built a Dashboard to Track ThemDEV CommunitySynthetic Population Testing for Recommendation SystemsDEV CommunityI Analyzed 500 AI Coding Mistakes and Built an ESLint Plugin to Catch ThemDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

SparVAR: Exploring Sparsity in Visual AutoRegressive Modeling for Training-Free Acceleration

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2602.04361v2 Announce Type: replace-cross Abstract: Visual AutoRegressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction paradigm. However, mainstream VAR paradigms attend to all tokens across historical scales at each autoregressive step. As the next scale resolution grows, the computational complexity of attention increases quartically with resolution, causing substantial latency. Prior accelerations often skip high-resolution scales, which speeds up inference but discards high-frequency details and harms image quality. To address these probl — Zekun Li, Ning Wang, Tongxin Bai, Changwang Mei, Peisong Wang, Shuang Qiu, Jian Cheng

View PDF HTML (experimental)

Abstract:Visual AutoRegressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction paradigm. However, mainstream VAR paradigms attend to all tokens across historical scales at each autoregressive step. As the next scale resolution grows, the computational complexity of attention increases quartically with resolution, causing substantial latency. Prior accelerations often skip high-resolution scales, which speeds up inference but discards high-frequency details and harms image quality. To address these problems, we present \textbf{SparVAR}, a training-free acceleration framework that exploits three properties of VAR attention: \textbf{(i) strong attention sinks}, \textbf{(ii) cross-scale activation similarity}, and \textbf{(iii) pronounced locality}. Specifically, we dynamically predict the sparse attention pattern of later high-resolution scales from a sparse decision scale, and construct scale self-similar sparse attention via an efficient index-mapping mechanism, enabling high-efficiency sparse attention computation at large scales. Furthermore, we propose cross-scale local sparse attention and implement an efficient block-wise sparse kernel, which achieves $\mathbf{> 5\times}$ faster forward speed than FlashAttention. Extensive experiments demonstrate that the proposed SparVAR can reduce the generation time of an 8B model producing $1024\times1024$ high-resolution images to the \textbf{1s}, \textbf{without skipping the last scales}. Compared with the VAR baseline accelerated by FlashAttention, our method achieves a $\mathbf{1.57\times}$ speed-up while preserving almost all high-frequency details. When combined with existing scale-skipping strategies, SparVAR attains up to a $\mathbf{2.28\times}$ acceleration, while maintaining competitive visual generation quality. Code is available at \href{this https URL}{SparVAR}.

Comments: CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2602.04361 [cs.CV]

(or arXiv:2602.04361v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2602.04361

arXiv-issued DOI via DataCite

Submission history

From: Zekun Li [view email] [v1] Wed, 4 Feb 2026 09:34:06 UTC (39,108 KB) [v2] Sun, 29 Mar 2026 03:01:17 UTC (33,999 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
SparVAR: Ex…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers