Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAssessing Marvell Technology (MRVL) After Nvidia’s US$2b AI Partnership And Connectivity Push - simplywall.stGNews AI NVIDIADutchess to host artificial intelligence summit at Marist in Poughkeepsie - Daily FreemanGoogle News: AIAnthropic’s Catastrophic Leak May Have Just Handed China the Blueprints to Claude Al - TipRanksGoogle News: ClaudeOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-UpWired AIMeta's AI push is reshaping how work gets done inside the companyBusiness InsiderOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up - WIREDGoogle News: OpenAI[P] Remote sensing foundation models made easy to use.Reddit r/MachineLearningFirst time NeurIPS. How different is it from low-ranked conferences? [D]Reddit r/MachineLearningAI & Tech brief: Ireland ascendant - The Washington PostGNews AI EUPeople would rather have an Amazon warehouse in their backyard than a data centerTechCrunch AITake-Two lays off its head of AI and several team members just two months after the CEO said it was embracing Gen AI - TweakTownGoogle News: Generative AIOpenAI Buys TBPN Tech Talk Show for Enterprise Client Outreach - News and Statistics - IndexBoxGoogle News: OpenAIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAssessing Marvell Technology (MRVL) After Nvidia’s US$2b AI Partnership And Connectivity Push - simplywall.stGNews AI NVIDIADutchess to host artificial intelligence summit at Marist in Poughkeepsie - Daily FreemanGoogle News: AIAnthropic’s Catastrophic Leak May Have Just Handed China the Blueprints to Claude Al - TipRanksGoogle News: ClaudeOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-UpWired AIMeta's AI push is reshaping how work gets done inside the companyBusiness InsiderOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up - WIREDGoogle News: OpenAI[P] Remote sensing foundation models made easy to use.Reddit r/MachineLearningFirst time NeurIPS. How different is it from low-ranked conferences? [D]Reddit r/MachineLearningAI & Tech brief: Ireland ascendant - The Washington PostGNews AI EUPeople would rather have an Amazon warehouse in their backyard than a data centerTechCrunch AITake-Two lays off its head of AI and several team members just two months after the CEO said it was embracing Gen AI - TweakTownGoogle News: Generative AIOpenAI Buys TBPN Tech Talk Show for Enterprise Client Outreach - News and Statistics - IndexBoxGoogle News: OpenAI
AI NEWS HUBbyEIGENVECTOREigenvector

PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding

HuggingFace PapersApril 1, 20268 min read0 views
Source Quiz

PixelPrune reduces computational costs in Vision-Language Models by eliminating redundant image patches before Vision Transformer encoding through predictive-coding-based compression. (2 upvotes on HuggingFace)

Published on Apr 1

·

Submitted by

NAN

on Apr 2

Authors:

,

,

Abstract

PixelPrune reduces computational costs in Vision-Language Models by eliminating redundant image patches before Vision Transformer encoding through predictive-coding-based compression.

AI-generated summary

Document understanding and GUI interaction are among the highest-value applications of Vision-Language Models (VLMs), yet they impose exceptionally heavy computational burden: fine-grained text and small UI elements demand high-resolution inputs that produce tens of thousands of visual tokens. We observe that this cost is largely wasteful -- across document and GUI benchmarks, only 22--71% of image patches are pixel-unique, the rest being exact duplicates of another patch in the same image. We propose PixelPrune, which exploits this pixel-level redundancy through predictive-coding-based compression, pruning redundant patches before the Vision Transformer (ViT) encoder. Because it operates in pixel space prior to any neural computation, PixelPrune accelerates both the ViT encoder and the downstream LLM, covering the full inference pipeline. The method is training-free, requires no learnable parameters, and supports pixel-lossless compression (τ{=}0) as well as controlled lossy compression (τ{>}0). Experiments across three model scales and document and GUI benchmarks show that PixelPrune maintains competitive task accuracy while delivering up to 4.2times inference speedup and 1.9times training acceleration. Code is available at https://github.com/OPPO-Mente-Lab/PixelPrune.

View arXiv page View PDF GitHub 5 Add to collection

Get this paper in your agent:

hf papers read 2604.00886

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.00886 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.00886 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.00886 in a Space README.md to link it from this page.

Collections including this paper 1

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
PixelPrune:…researchpaperarxivVision-Lang…Vision Tran…predictive-…HuggingFace…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 136 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers