Research Papers research paper arxiv computer-vision image-recognition

EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation

arXivMarch 30, 202610 min read0 views

arXiv:2603.18739v3 Announce Type: replace Abstract: Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation l — Longfei Liu, Yongjie Hou, Yang Li, Qirui Wang, Youyang Sha, Yongjun Yu, Yinzhi Wang, Peizhe Ru, Xuanlong Yu, Xi Shen

View PDF HTML (experimental)

Abstract:Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation learning in small scale ViTs, rather than an inherent mismatch between ViTs and edge dense prediction. To address this issue, we introduce EdgeCrafter, a unified compact ViT framework for edge dense prediction centered on ECDet, a detection model built from a distilled compact backbone and an edge-friendly encoder decoder design. On the COCO dataset, ECDet-S achieves 51.7 AP with fewer than 10M parameters using only COCO annotations. For instance segmentation, ECInsSeg achieves performance comparable to RF-DETR while using substantially fewer parameters. For pose estimation, ECPose-X reaches 74.8 AP, significantly outperforming YOLO26Pose-X (71.6 AP). These results show that compact ViTs, when paired with task-specialized distillation and edge-aware design, can be a practical and competitive option for edge dense prediction. Code is available at: this https URL

Comments: Code is available at: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.18739 [cs.CV]

(or arXiv:2603.18739v3 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.18739

arXiv-issued DOI via DataCite

Submission history

From: Longfei Liu [view email] [v1] Thu, 19 Mar 2026 10:39:51 UTC (2,775 KB) [v2] Wed, 25 Mar 2026 10:52:18 UTC (2,777 KB) [v3] Fri, 27 Mar 2026 14:12:01 UTC (2,777 KB)

Original source

arXiv

https://arxiv.org/abs/2603.18739

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Products

AI Regulation Insights

As Canada s trusted partner in AI advancement, Vector Institute continues to bridge cutting-edge research with practical industry applications through strategic initiatives. In response to the rapidly evolving AI regulatory landscape, [ ] The post AI Regulation Insights appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mabout 1 year ago

Products

Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making

New research from Vector Faculty Member Jeff Clune and Vector Graduate Student Shengran Hu introduces a groundbreaking approach to imitation learning that could potentially revolutionize how we train AI agents. [ ] The post Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mabout 1 year ago

Analyst News

Recommender Systems: Where Academia Meets Industry

Authors: Shaina Raza, Amirmohammad Kazemeini This blog is based on the survey paper “A Comprehensive Review of Recommender Systems.” Recommender Systems (RS) blend artificial intelligence (AI) and personalization in a [ ] The post Recommender Systems: Where Academia Meets Industry appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mabout 1 year ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 151 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

I was a beta tester for the Nobel prize-winning AlphaFold AI – it’s going to revolutionise health research - The Conversation

I was a beta tester for the Nobel prize-winning AlphaFold AI – it’s going to revolutionise health research The Conversation

GNews AI protein

1mover 1 year ago

Research PapersRecent

IBM Advances Quantum Computing Research: Will it Boost Prospects? - Yahoo Finance Singapore

IBM Advances Quantum Computing Research: Will it Boost Prospects? Yahoo Finance Singapore

GNews AI quantum

1m1 day ago

Research PapersFresh

Quantum computers might crack today's encryption far sooner than we thought

According to a study by engineers at Caltech and the UC Department of Physics, quantum computers do not need to be nearly as powerful as previously believed to crack the most advanced cryptographic technologies. The research claims that Shor's algorithm could break RSA public-key encryption using quantum computers with just... Read Entire Article

TechSpot

1mabout 5 hours ago

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1m29 days ago