Research Papers research paper arxiv computer-vision image-recognition

Tracking by Detection and Query: An Efficient End-to-End Framework for Multi-Object Tracking

arXivMarch 31, 20262 min read0 views

arXiv:2411.06197v3 Announce Type: replace Abstract: Multi-object tracking (MOT) is primarily dominated by two paradigms: tracking-by-detection (TBD) and tracking-by-query (TBQ). While TBD offers modular efficiency, its fragmented association pipeline often limits robustness in complex scenarios. Conversely, TBQ enhances semantic modeling end-to-end but suffers from high training costs and slow inference due to the tight coupling of detection and association. In this work, we propose the tracking-by-detection-and-query framework, TBDQ-Net, to advance the synergy between TBD and TBQ paradigms. B — Shukun Jia, Shiyu Hu, Yichao Cao, Feng Yang, Xin Lu, Xiaobo Lu

View PDF HTML (experimental)

Abstract:Multi-object tracking (MOT) is primarily dominated by two paradigms: tracking-by-detection (TBD) and tracking-by-query (TBQ). While TBD offers modular efficiency, its fragmented association pipeline often limits robustness in complex scenarios. Conversely, TBQ enhances semantic modeling end-to-end but suffers from high training costs and slow inference due to the tight coupling of detection and association. In this work, we propose the tracking-by-detection-and-query framework, TBDQ-Net, to advance the synergy between TBD and TBQ paradigms. By integrating a frozen detector with a lightweight associator, this architecture ensures intrinsic efficiency. Within this streamlined framework, we introduce tailored designs to address MOT-specific challenges. Concretely, we alleviate task conflicts and occlusions through the dual-stream update of the Basic Information Interaction (BII) module. The Content-Position Alignment (CPA) module further refines both content and positional components, providing well-aligned representations for association decoding. Extensive evaluations on DanceTrack, SportsMOT, and MOT20 benchmarks demonstrate that TBDQ-Net achieves a favorable efficiency-accuracy trade-off in challenging scenarios. Specifically, TBDQ-Net outperforms leading TBD methods by 6.0 IDF1 points on DanceTrack and achieves the best performance among TBQ methods in the crowded MOT20 benchmark. Relative to MOTRv2, TBDQ-Net reduces trainable parameters by approximately 80% while accelerating practical inference by 37.5%. These results highlight TBDQ-Net as an efficient alternative to heavy architectures, showcasing the efficacy of lightweight design. Source code is publicly available at this https URL.

Comments: Accepted by Pattern Recognition

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2411.06197 [cs.CV]

(or arXiv:2411.06197v3 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2411.06197

arXiv-issued DOI via DataCite

DOI(s) linking to related resources

Submission history

From: Shukun Jia [view email] [v1] Sat, 9 Nov 2024 14:38:08 UTC (1,149 KB) [v2] Sat, 28 Jun 2025 02:52:56 UTC (3,752 KB) [v3] Sun, 29 Mar 2026 02:11:00 UTC (3,932 KB)

Original source

arXiv

https://arxiv.org/abs/2411.06197

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

How AI-powered echolocation is giving small drones night vision

To help small aerial robots navigate in the dark and other low-visibility environments, my colleagues and I developed an ultrasound-based perception system inspired by bat echolocation. Current robots rely heavily on cameras or light detection and ranging , known as lidar, or both. But these sensors fail in visually challenging conditions, such as smoke, fog, dust, snow, or complete darkness. I’m a scientific engineer who develops bio-inspired microrobots. To solve this challenge, my research team looked at nature’s experts at navigating in poor visibility: bats. They thrive in dark, damp, and dusty caves and can detect obstacles as thin as a human hair using echolocation while weighing as little as two paper clips. They emit sound waves and listen to weak echoes reflected from objects. Ho

Fast Company Tech

4mabout 5 hours ago

Market News

Inside CMU’s Push To Transform Treatment for Cancer, Organ Failure and Chronic Disease

<p> <img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/250716A_3D_Bio_Lab234.jpg.webp?itok=f-g_ECey" width="900" height="508" alt="Tissue lab"> </p> Researchers at Carnegie Mellon University are revolutionizing medical care for diseases that impact millions of Americans and the treatments they develop could alleviate major public health challenges.

Carnegie Mellon News

1m2 months ago

Market News

The At-Home Test That Could Catch Cancer Earlier

<p> <img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/MC-200709A-Nanolab-0658.jpeg.webp?itok=tnAFG-Hk" width="900" height="508" alt="Nanotechnology laboratory"> </p> Researchers at Carnegie Mellon University are developing ways to catch cancer earlier than ever before. The project showed such promise that it was awarded up to $26.7 million in federal funding from the Advanced Research Projects Agency for Health (ARPA-H).

Carnegie Mellon News

1mabout 2 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 232 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

How AI-powered echolocation is giving small drones night vision

Fast Company Tech

4mabout 5 hours ago

Research PapersFresh

"You've got a friend in me": Co-Designing a Peer Social Robot for Young Newcomers' Language and Cultural Learning

arXiv:2603.18804v3 Announce Type: replace-cross Abstract: Community literacy programs supporting young newcomer children in Canada face limited staffing and scarce one-to-one time, which constrains personalized English and cultural learning support. This paper reports on a co-design study with United for Literacy tutors that informed Maple, a table-top, peer-like Socially Assistive Robot (SAR) designed as a practice partner within tutor-mediated sessions. From shadowing and co-design interviews, we derived newcomer-specific requirements and added them in an integrated prototype that uses short story-based activities, multi-modal scaffolding and embedded quizzes that support attention while producing tutor-actionable formative signals. We contribute system design implications for tutor-in-t

arXiv cs.HC

1mabout 11 hours ago

Research PapersFresh

Exploring Sidewalk Sheds in New York City through Chatbot Surveys and Human Computer Interaction

arXiv:2601.23095v2 Announce Type: replace Abstract: Sidewalk sheds are a common feature of the streetscape in New York City, reflecting ongoing construction and maintenance activities. However, policymakers and local business owners have raised concerns about reduced storefront visibility and altered pedestrian navigation. Although sidewalk sheds are widely used for safety, their effects on pedestrian visibility and movement are not directly measured in current planning practices. To address this, we developed an AI-based chatbot survey that collects image-based annotations and route choices from pedestrians, linking these responses to specific shed design features, including clearance height, post spacing, and color. This AI chatbot survey integrates a large language model (e.g., Google's

arXiv cs.HC

2mabout 11 hours ago

Research PapersFresh

Structured identification of multivariable modal systems

arXiv:2510.10820v2 Announce Type: replace-cross Abstract: Physically interpretable models are essential for next-generation industrial systems, as these representations enable effective control, support design validation, and provide a foundation for monitoring strategies. The aim of this paper is to develop a system identification framework for estimating modal models of complex multivariable mechanical systems from frequency response data. To achieve this, a two-step structured identification algorithm is presented, where an additive model is first estimated using a refined instrumental variable method and subsequently projected onto a modal form. The developed identification method provides accurate, physically-relevant, minimal-order models, for both generally-damped and proportionally

arXiv eess.SP

1mabout 11 hours ago