AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection
arXiv:2405.12944v2 Announce Type: replace Abstract: Multispectral pedestrian detection has been shown to be effective in improving performance within complex illumination scenarios. However, prevalent double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data, leading to nearly double the inference time compared to single-stream networks utilizing only one feature extraction branch. This increased inference time has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems. To — Zizhao Chen, Yeqiang Qian, Xiaoxiao Yang, Chunxiang Wang, Ming Yang
View PDF HTML (experimental)
Abstract:Multispectral pedestrian detection has been shown to be effective in improving performance within complex illumination scenarios. However, prevalent double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data, leading to nearly double the inference time compared to single-stream networks utilizing only one feature extraction branch. This increased inference time has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems. To address this limitation, various knowledge distillation methods have been proposed. However, traditional distillation methods focus only on the fusion features and ignore the large amount of information in the original multi-modal features, thereby restricting the student network's performance. To tackle the challenge, we introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network. Specifically, a Modal Extraction Alignment (MEA) module is utilized to derive learning weights for student networks, integrating focal and global attention mechanisms. This methodology enables the student network to acquire optimal fusion strategies independent from that of teacher network without necessitating an additional feature fusion module. Furthermore, we present the SMOD dataset, a well-aligned challenging multispectral dataset for detection. Extensive experiments on the challenging KAIST, LLVIP and SMOD datasets are conducted to validate the effectiveness of AMFD. The results demonstrate that our method outperforms existing state-of-the-art methods in both reducing log-average Miss Rate and improving mean Average Precision. The code is available at this https URL.
Comments: Accepted by IEEE Transactions on Multimedia
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2405.12944 [cs.CV]
(or arXiv:2405.12944v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2405.12944
arXiv-issued DOI via DataCite
Submission history
From: Zizhao Chen [view email] [v1] Tue, 21 May 2024 17:17:17 UTC (34,923 KB) [v2] Fri, 27 Mar 2026 03:22:20 UTC (26,623 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivExtremism Researchers Pivot to AI Industry’s Trust and Safety Gaps - Startup Fortune
<a href="https://news.google.com/rss/articles/CBMimAFBVV95cUxOVjhVWEgxd2NGRlByWm50dzRDVDFYbDh3SHRJNHdsZVFrMGc5YV9zZlRIQWU5NWo2MzBZSVVxWDhoZG03RTctVmszRllsdVJNU2lvWnlwUVZfNXp2bkZXMjRoQ2I4RzYxRzduSWVFd0JyZ0lCM0pVOWtETjVRLWE0Q2RwbVp0SlQ2Y2hteEVFTEZfc3N1T1ZBTA?oc=5" target="_blank">Extremism Researchers Pivot to AI Industry’s Trust and Safety Gaps</a> <font color="#6f6f6f">Startup Fortune</font>
Fair decisions, clear reasons: Creating Fuzzy AI with fairness built in from the start - Asia Research News |
<a href="https://news.google.com/rss/articles/CBMirAFBVV95cUxOS2ZFSlhpUDZueldBUXpTQ1MtZ3QwX0l1bjhsQzAxRjF1ZnhQOHNDcTA4VzM0a3FKQ0pSRUY3Q2JCa3lVTnBxU29jLU5Gd2Rnb2stQmxnSl95MDNVZGlCSHVJZVFfQWp1azR3UnNOU3pkZjlXTW1ESTU4V0lMdy1RbFRLUC10anpoUUpwZ0dKc2E2VVdvUDBCb2tQbHRLSTNrcW8zMHJiMGdmNHJC?oc=5" target="_blank">Fair decisions, clear reasons: Creating Fuzzy AI with fairness built in from the start</a> <font color="#6f6f6f">Asia Research News |</font>
ANU partners with AI safety company Anthropic to strengthen its AI research and teaching - anu.edu.au
<a href="https://news.google.com/rss/articles/CBMiwwFBVV95cUxQd1FoSHltem1yYjdFNzJGRFZtY0lOLXpnOVRiMzZENEdXZVRDYV9JQVRFbnNwdVB3QU5tYm4tWDF5V3YzYkIwYTdNYThVUjdRcjNoTmRQZWs4SWM0d1BZMi1VdEdHTG1xVWhrTkx4YzRrckVGZDFtRVdDdDVPSGFiQnBPR0tycHZrR0VSUndzODlFRi1STFM2R0NlN2RsbUo5Y3l1NzhlZmxvWXA0NHpNVDBYeGxwT0lJWFpvaUxUNDJma0U?oc=5" target="_blank">ANU partners with AI safety company Anthropic to strengthen its AI research and teaching</a> <font color="#6f6f6f">anu.edu.au</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models
arXiv:2604.00890v1 Announce Type: new Abstract: Geometric Problem Solving (GPS) remains at the heart of enhancing mathematical reasoning in large language models because it requires the combination of diagrammatic understanding, symbolic manipulation and logical inference. In existing literature, researchers have chiefly focused on synchronising the diagram descriptions with text literals and solving the problem. In this vein, they have either taken a neural, symbolic or neuro-symbolic approach. But this solves only the first two of the requirements, namely diagrammatic understanding and symbolic manipulation, while leaving logical inference underdeveloped. The logical inference is often limited to one chain-of-thought (CoT). To address this weakness in hitherto existing models, this paper

Google research suggests encryption technique used by Bitcoin will be cracked by quantum computers around 2029 — search giant says quantum attacks need to be prepared for now
Google research suggests encryption technique used by Bitcoin will be cracked by quantum computers around 2029 — search giant says quantum attacks need to be prepared for now

ARGS: Auto-Regressive Gaussian Splatting via Parallel Progressive Next-Scale Prediction
arXiv:2604.00494v1 Announce Type: new Abstract: Auto-regressive frameworks for next-scale prediction of 2D images have demonstrated strong potential for producing diverse and sophisticated content by progressively refining a coarse input. However, extending this paradigm to 3D object generation remains largely unexplored. In this paper, we introduce auto-regressive Gaussian splatting (ARGS), a framework for making next-scale predictions in parallel for generation according to levels of detail. We propose a Gaussian simplification strategy and reverse the simplification to guide next-scale generation. Benefiting from the use of hierarchical trees, the generation process requires only \(\mathcal{O}(\log n)\) steps, where \(n\) is the number of points. Furthermore, we propose a tree-based tra

Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors
arXiv:2507.21989v3 Announce Type: replace-cross Abstract: Advances in embedding models for text, image, audio, and video drive progress across multiple domains, including retrieval-augmented generation, recommendation systems, and others. Many of these applications require an efficient method to retrieve items that are close to a given query in the embedding space while satisfying a filter condition based on the item's attributes, a problem known as filtered approximate nearest neighbor search (FANNS). By performing an in-depth literature analysis on FANNS, we identify a key gap in the research landscape: publicly available datasets with embedding vectors from state-of-the-art transformer-based text embedding models that contain abundant real-world attributes covering a broad spectrum of a
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!