EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation
arXiv:2603.18739v3 Announce Type: replace Abstract: Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation l — Longfei Liu, Yongjie Hou, Yang Li, Qirui Wang, Youyang Sha, Yongjun Yu, Yinzhi Wang, Peizhe Ru, Xuanlong Yu, Xi Shen
View PDF HTML (experimental)
Abstract:Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation learning in small scale ViTs, rather than an inherent mismatch between ViTs and edge dense prediction. To address this issue, we introduce EdgeCrafter, a unified compact ViT framework for edge dense prediction centered on ECDet, a detection model built from a distilled compact backbone and an edge-friendly encoder decoder design. On the COCO dataset, ECDet-S achieves 51.7 AP with fewer than 10M parameters using only COCO annotations. For instance segmentation, ECInsSeg achieves performance comparable to RF-DETR while using substantially fewer parameters. For pose estimation, ECPose-X reaches 74.8 AP, significantly outperforming YOLO26Pose-X (71.6 AP). These results show that compact ViTs, when paired with task-specialized distillation and edge-aware design, can be a practical and competitive option for edge dense prediction. Code is available at: this https URL
Comments: Code is available at: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.18739 [cs.CV]
(or arXiv:2603.18739v3 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.18739
arXiv-issued DOI via DataCite
Submission history
From: Longfei Liu [view email] [v1] Thu, 19 Mar 2026 10:39:51 UTC (2,775 KB) [v2] Wed, 25 Mar 2026 10:52:18 UTC (2,777 KB) [v3] Fri, 27 Mar 2026 14:12:01 UTC (2,777 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
AI Regulation Insights
As Canada s trusted partner in AI advancement, Vector Institute continues to bridge cutting-edge research with practical industry applications through strategic initiatives. In response to the rapidly evolving AI regulatory landscape, [ ] The post AI Regulation Insights appeared first on Vector Institute for Artificial Intelligence .

Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making
New research from Vector Faculty Member Jeff Clune and Vector Graduate Student Shengran Hu introduces a groundbreaking approach to imitation learning that could potentially revolutionize how we train AI agents. [ ] The post Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making appeared first on Vector Institute for Artificial Intelligence .

Recommender Systems: Where Academia Meets Industry
Authors: Shaina Raza, Amirmohammad Kazemeini This blog is based on the survey paper “A Comprehensive Review of Recommender Systems.” Recommender Systems (RS) blend artificial intelligence (AI) and personalization in a [ ] The post Recommender Systems: Where Academia Meets Industry appeared first on Vector Institute for Artificial Intelligence .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Quantum computers might crack today's encryption far sooner than we thought
According to a study by engineers at Caltech and the UC Department of Physics, quantum computers do not need to be nearly as powerful as previously believed to crack the most advanced cryptographic technologies. The research claims that Shor's algorithm could break RSA public-key encryption using quantum computers with just... Read Entire Article

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!