Research Papers research paper arxiv computer-vision image-recognition

HSD: Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding

arXivMarch 31, 20262 min read0 views

arXiv:2602.12957v2 Announce Type: replace Abstract: Document parsing is a fundamental task in multimodal understanding, supporting a wide range of downstream applications such as information extraction and intelligent document analysis. Benefiting from strong semantic modeling and robust generalization, VLM-based end-to-end approaches have emerged as the mainstream paradigm in recent years. However, these models often suffer from substantial inference latency, as they must autoregressively generate long, full-page sequences when processing long-form documents. While recent hybrid methods mitig — Wenhui Liao, Hongliang Li, Pengyu Xie, Xinyu Cai, Yufan Shen, Yi Xin, Qi Qin, Shenglong Ye, Tianbin Li, Ming Hu, Junjun He, Yihao Liu, Wenhai Wang, Min Dou, Bin Fu, Botian Shi, Yu Qiao, Lianwen Jin

Authors:Wenhui Liao, Hongliang Li, Pengyu Xie, Xinyu Cai, Yufan Shen, Yi Xin, Qi Qin, Shenglong Ye, Tianbin Li, Ming Hu, Junjun He, Yihao Liu, Wenhai Wang, Min Dou, Bin Fu, Botian Shi, Yu Qiao, Lianwen Jin

View PDF HTML (experimental)

Abstract:Document parsing is a fundamental task in multimodal understanding, supporting a wide range of downstream applications such as information extraction and intelligent document analysis. Benefiting from strong semantic modeling and robust generalization, VLM-based end-to-end approaches have emerged as the mainstream paradigm in recent years. However, these models often suffer from substantial inference latency, as they must autoregressively generate long, full-page sequences when processing long-form documents. While recent hybrid methods mitigate this issue via region-level parallel decoding with VLMs, independent region decoding loses full-page context and might weaken global coherence. To address this issue, we propose Hierarchical Speculative Decoding (HSD), a two-stage local-to-global framework for document parsing. HSD first employs a lightweight pipeline drafter to predict region partitions and generate coarse drafts for each region. The first stage verifies the generated region-level drafts in parallel for efficiency, while the second stage further performs page-level verification on these refined outputs to preserve full-page coherence. Experimental results show that our HSD achieves a 2.78x near-lossless speedup with HunyuanOCR on OmniDocBench v1.5 and up to 7.04x speedup on long-document parsing tasks, demonstrating the effectiveness of our proposed method. We will release our code to facilitate reproducibility.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2602.12957 [cs.CV]

(or arXiv:2602.12957v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2602.12957

arXiv-issued DOI via DataCite

Submission history

From: Wenhui Liao [view email] [v1] Fri, 13 Feb 2026 14:22:10 UTC (5,464 KB) [v2] Sun, 29 Mar 2026 02:11:43 UTC (5,395 KB)

Original source

arXiv

https://arxiv.org/abs/2602.12957

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

Philipp Müller starts as Cyber Valley Max Planck Independent Research Group Leader

is.mpg.de

1m5 months ago

Research Papers

We are hiring a new Max Planck Research Group Leader at the MPI for Intelligent Systems in Stuttgart

is.mpg.de

1m4 months ago

Market News

CELLnROLL receives funding from EXIST Research Transfer

is.mpg.de

1m4 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 68 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Philipp Müller starts as Cyber Valley Max Planck Independent Research Group Leader

is.mpg.de

1m5 months ago

Research Papers

We are hiring a new Max Planck Research Group Leader at the MPI for Intelligent Systems in Stuttgart

is.mpg.de

1m4 months ago

Research Papers

More room for world class research

is.mpg.de

1m5 months ago

Research Papers

Telia agrees Swedish sovereign AI deal with Brookfield - Telecompaper

<a href="https://news.google.com/rss/articles/CBMingFBVV95cUxQY1ZCaEFJUVJLNFJUOWoyLVBqVGxCdjQ1QUJ6WEdPdVFvU0ZMVnZpZG9IY1YxaFlFOXhqME1lRXBWd2x5Tjg2bDdnaWlzQUxwQkZPWG1KU1RwN25BelRhREJyTXEwZWI2Vk9nTTlLdnI1RDFhQnpWa3hpa1ZwTHc1cGNNVmVtckFianM2YlNVZXJFZ3U2X2NmMl9BcUN4QQ?oc=5" target="_blank">Telia agrees Swedish sovereign AI deal with Brookfield</a> <font color="#6f6f6f">Telecompaper</font>

Google News AI Sweden

1m15 days ago