Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechIntel to Report First-Quarter 2026 Financial Resultsnewsroom.intel.comMeta’s Court Losses Put AI Governance Under New Pressure - The National CIO ReviewGNews AI MetaCompanies bet on agentic SOC as AI reshapes security - SiliconANGLEGNews AI IBMStop Searching, Start Contributing: How GoodFirstGo is Making Open Source ApproachableDEV CommunityBest-Selling AI SEO Book “AI SEO 2026” Now Available for Business Owners and Personal Brands Seeking to Be Found by AI Search - StreetInsiderGNews AI searchMicrosoft closes worst quarter on Wall Street since 2008 on AI concerns: 'Redmond is in a pickle' - CNBCGNews AI CopilotCalifornia Tightens AI Contract Rules as Fight With Trump Admin Grows - YahooGNews AI regulationCalifornia Tightens AI Contract Rules as Fight With Trump Admin GrowsDecrypt AIBuilding a LEGO-like remote Agent - Jean2DEV CommunityStudents Renting Smart Glasses to Cheat on TestsFuturism AIWhat's next after bitcoin's historic underperformance stretch against stocksCoinDesk AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechIntel to Report First-Quarter 2026 Financial Resultsnewsroom.intel.comMeta’s Court Losses Put AI Governance Under New Pressure - The National CIO ReviewGNews AI MetaCompanies bet on agentic SOC as AI reshapes security - SiliconANGLEGNews AI IBMStop Searching, Start Contributing: How GoodFirstGo is Making Open Source ApproachableDEV CommunityBest-Selling AI SEO Book “AI SEO 2026” Now Available for Business Owners and Personal Brands Seeking to Be Found by AI Search - StreetInsiderGNews AI searchMicrosoft closes worst quarter on Wall Street since 2008 on AI concerns: 'Redmond is in a pickle' - CNBCGNews AI CopilotCalifornia Tightens AI Contract Rules as Fight With Trump Admin Grows - YahooGNews AI regulationCalifornia Tightens AI Contract Rules as Fight With Trump Admin GrowsDecrypt AIBuilding a LEGO-like remote Agent - Jean2DEV CommunityStudents Renting Smart Glasses to Cheat on TestsFuturism AIWhat's next after bitcoin's historic underperformance stretch against stocksCoinDesk AI

HSD: Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2602.12957v2 Announce Type: replace Abstract: Document parsing is a fundamental task in multimodal understanding, supporting a wide range of downstream applications such as information extraction and intelligent document analysis. Benefiting from strong semantic modeling and robust generalization, VLM-based end-to-end approaches have emerged as the mainstream paradigm in recent years. However, these models often suffer from substantial inference latency, as they must autoregressively generate long, full-page sequences when processing long-form documents. While recent hybrid methods mitig — Wenhui Liao, Hongliang Li, Pengyu Xie, Xinyu Cai, Yufan Shen, Yi Xin, Qi Qin, Shenglong Ye, Tianbin Li, Ming Hu, Junjun He, Yihao Liu, Wenhai Wang, Min Dou, Bin Fu, Botian Shi, Yu Qiao, Lianwen Jin

Authors:Wenhui Liao, Hongliang Li, Pengyu Xie, Xinyu Cai, Yufan Shen, Yi Xin, Qi Qin, Shenglong Ye, Tianbin Li, Ming Hu, Junjun He, Yihao Liu, Wenhai Wang, Min Dou, Bin Fu, Botian Shi, Yu Qiao, Lianwen Jin

View PDF HTML (experimental)

Abstract:Document parsing is a fundamental task in multimodal understanding, supporting a wide range of downstream applications such as information extraction and intelligent document analysis. Benefiting from strong semantic modeling and robust generalization, VLM-based end-to-end approaches have emerged as the mainstream paradigm in recent years. However, these models often suffer from substantial inference latency, as they must autoregressively generate long, full-page sequences when processing long-form documents. While recent hybrid methods mitigate this issue via region-level parallel decoding with VLMs, independent region decoding loses full-page context and might weaken global coherence. To address this issue, we propose Hierarchical Speculative Decoding (HSD), a two-stage local-to-global framework for document parsing. HSD first employs a lightweight pipeline drafter to predict region partitions and generate coarse drafts for each region. The first stage verifies the generated region-level drafts in parallel for efficiency, while the second stage further performs page-level verification on these refined outputs to preserve full-page coherence. Experimental results show that our HSD achieves a 2.78x near-lossless speedup with HunyuanOCR on OmniDocBench v1.5 and up to 7.04x speedup on long-document parsing tasks, demonstrating the effectiveness of our proposed method. We will release our code to facilitate reproducibility.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2602.12957 [cs.CV]

(or arXiv:2602.12957v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2602.12957

arXiv-issued DOI via DataCite

Submission history

From: Wenhui Liao [view email] [v1] Fri, 13 Feb 2026 14:22:10 UTC (5,464 KB) [v2] Sun, 29 Mar 2026 02:11:43 UTC (5,395 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
HSD: Traini…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 68 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!