Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechBring state-of-the-art agentic skills to the edge with Gemma 4Google Developers BlogTrump administration appeals ruling that blocked Pentagon action against Anthropic over AI dispute - The Washington PostGNews AI USAThe Corner-StoneLessWrongQuantum-Powered Crypto Mining Is Here—But It Won't Help You Mine BitcoinDecrypt AIv0.20.0-rc1: convert: support new Gemma4 audio_tower tensor naming (#15221)Ollama ReleasesAchieving Single-Digit Microsecond Latency Inference for Capital MarketsNVIDIA Tech BlogService Design in the Age of AI: Why Information Flow Is the New InterfaceMedium AIBringing AI Closer to the Edge and On-Device with Gemma 4NVIDIA Tech Blog5 Ways to Stop Writing Prompts and Start Programming AIMedium AIThe DisplacementMedium AIWorkerMill – open-source AI coding team, multi-expert orchestrationHacker News AI TopBlack Hat USADark ReadingBlack Hat AsiaAI BusinessThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechBring state-of-the-art agentic skills to the edge with Gemma 4Google Developers BlogTrump administration appeals ruling that blocked Pentagon action against Anthropic over AI dispute - The Washington PostGNews AI USAThe Corner-StoneLessWrongQuantum-Powered Crypto Mining Is Here—But It Won't Help You Mine BitcoinDecrypt AIv0.20.0-rc1: convert: support new Gemma4 audio_tower tensor naming (#15221)Ollama ReleasesAchieving Single-Digit Microsecond Latency Inference for Capital MarketsNVIDIA Tech BlogService Design in the Age of AI: Why Information Flow Is the New InterfaceMedium AIBringing AI Closer to the Edge and On-Device with Gemma 4NVIDIA Tech Blog5 Ways to Stop Writing Prompts and Start Programming AIMedium AIThe DisplacementMedium AIWorkerMill – open-source AI coding team, multi-expert orchestrationHacker News AI Top
AI NEWS HUBbyEIGENVECTOREigenvector

EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.18739v3 Announce Type: replace Abstract: Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation l — Longfei Liu, Yongjie Hou, Yang Li, Qirui Wang, Youyang Sha, Yongjun Yu, Yinzhi Wang, Peizhe Ru, Xuanlong Yu, Xi Shen

View PDF HTML (experimental)

Abstract:Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation learning in small scale ViTs, rather than an inherent mismatch between ViTs and edge dense prediction. To address this issue, we introduce EdgeCrafter, a unified compact ViT framework for edge dense prediction centered on ECDet, a detection model built from a distilled compact backbone and an edge-friendly encoder decoder design. On the COCO dataset, ECDet-S achieves 51.7 AP with fewer than 10M parameters using only COCO annotations. For instance segmentation, ECInsSeg achieves performance comparable to RF-DETR while using substantially fewer parameters. For pose estimation, ECPose-X reaches 74.8 AP, significantly outperforming YOLO26Pose-X (71.6 AP). These results show that compact ViTs, when paired with task-specialized distillation and edge-aware design, can be a practical and competitive option for edge dense prediction. Code is available at: this https URL

Comments: Code is available at: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.18739 [cs.CV]

(or arXiv:2603.18739v3 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.18739

arXiv-issued DOI via DataCite

Submission history

From: Longfei Liu [view email] [v1] Thu, 19 Mar 2026 10:39:51 UTC (2,775 KB) [v2] Wed, 25 Mar 2026 10:52:18 UTC (2,777 KB) [v3] Fri, 27 Mar 2026 14:12:01 UTC (2,777 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
EdgeCrafter…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 151 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!