P$^2$HCT: Plug-and-Play Hierarchical C2F Transformer for Multi-Scale Feature Fusion
arXiv:2505.12772v3 Announce Type: replace Abstract: Feature fusion plays a pivotal role in achieving high performance in vision models, yet existing attention-based fusion techniques often suffer from substantial computational overhead and implementation complexity, particularly in resource-constrained settings. To address these limitations, we introduce the Plug-and-Play Hierarchical C2F Transformer (P$^2$HCT), a lightweight module that combines coarse-to-fine token selection with shared attention parameters to preserve spatial details while reducing inference cost. P$^2$HCT is trainable usin — Junyi Hu, Tian Bai, Fengyi Wu, Zhenming Peng, Yi Zhang
View PDF HTML (experimental)
Abstract:Feature fusion plays a pivotal role in achieving high performance in vision models, yet existing attention-based fusion techniques often suffer from substantial computational overhead and implementation complexity, particularly in resource-constrained settings. To address these limitations, we introduce the Plug-and-Play Hierarchical C2F Transformer (P$^2$HCT), a lightweight module that combines coarse-to-fine token selection with shared attention parameters to preserve spatial details while reducing inference cost. P$^2$HCT is trainable using coarse attention alone and can be seamlessly activated at inference to enhance accuracy without retraining. Integrated into real-time detectors such as YOLOv11-N/S/M, P$^2$HCT achieves mAP gains of 0.9%, 0.5%, and 0.4% on MS COCO with minimal latency increase. Similarly, embedding P$^2$HCT into ResNet-18/50/101 backbones improves ImageNet top-1 accuracy by 6.5%, 1.7%, and 1.0%, respectively. These results underscore P$^2$HCT's effectiveness as a hardware-friendly and general-purpose enhancement for both detection and classification tasks.
Comments: 12 pages, 6 figures, ICME2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2505.12772 [cs.CV]
(or arXiv:2505.12772v3 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2505.12772
arXiv-issued DOI via DataCite
Submission history
From: Junyi Hu [view email] [v1] Mon, 19 May 2025 07:00:54 UTC (3,585 KB) [v2] Tue, 20 May 2025 09:01:16 UTC (3,585 KB) [v3] Mon, 30 Mar 2026 15:02:15 UTC (3,491 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
DenseNet Paper Walkthrough: All Connected
When we try to train a very deep neural network model, one issue that we might encounter is the vanishing gradient problem. This is essentially a problem where the weight update of a model during training slows down or even stops, hence causing the model not to improve. When a network is very deep, the [ ] The post DenseNet Paper Walkthrough: All Connected appeared first on Towards Data Science .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

How Leg Stiffness Affects Energy Economy in Hopping
arXiv:2501.03971v2 Announce Type: replace Abstract: In the fields of robotics and biomechanics, the integration of elastic elements such as springs and tendons in legged systems has long been recognized for enabling energy-efficient locomotion. Yet, a significant challenge persists: designing a robotic leg that perform consistently across diverse operating conditions, especially varying average forward speeds. It remains unclear whether, for such a range of operating conditions, the stiffness of the elastic elements needs to be varied or if a similar performance can be obtained by changing the motion and actuation while keeping the stiffness fixed. This work explores the influence of the leg stiffness on the energy efficiency of a monopedal robot through an extensive parametric study of it





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!