FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition
arXiv:2603.26908v1 Announce Type: new Abstract: Model fusion is a key strategy for robust recognition in unconstrained scenarios, as different models provide complementary strengths. This is especially important for whole-body human recognition, where biometric cues such as face, gait, and body shape vary across samples and are typically integrated via score-fusion. However, existing score-fusion strategies are usually static, invoking all models for every test sample regardless of sample quality or modality reliability. To overcome these limitations, we propose \textbf{FusionAgent}, a novel a — Jie Zhu, Xiao Guo, Yiyang Su, Anil Jain, Xiaoming Liu
View PDF HTML (experimental)
Abstract:Model fusion is a key strategy for robust recognition in unconstrained scenarios, as different models provide complementary strengths. This is especially important for whole-body human recognition, where biometric cues such as face, gait, and body shape vary across samples and are typically integrated via score-fusion. However, existing score-fusion strategies are usually static, invoking all models for every test sample regardless of sample quality or modality reliability. To overcome these limitations, we propose \textbf{FusionAgent}, a novel agentic framework that leverages a Multimodal Large Language Model (MLLM) to perform dynamic, sample-specific model selection. Each expert model is treated as a tool, and through Reinforcement Fine-Tuning (RFT) with a metric-based reward, the agent learns to adaptively determine the optimal model combination for each test input. To address the model score misalignment and embedding heterogeneity, we introduce Anchor-based Confidence Top-k (ACT) score-fusion, which anchors on the most confident model and integrates complementary predictions in a confidence-aware manner. Extensive experiments on multiple whole-body biometric benchmarks demonstrate that FusionAgent significantly outperforms SoTA methods while achieving higher efficiency through fewer model invocations, underscoring the critical role of dynamic, explainable, and robust model fusion in real-world recognition systems. Project page: \href{this https URL}{FusionAgent}.
Comments: CVPR 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.26908 [cs.CV]
(or arXiv:2603.26908v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.26908
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Jie Zhu [view email] [v1] Fri, 27 Mar 2026 18:35:44 UTC (1,300 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Bankai (卍解) — the first post-training adaptation method for true 1-bit LLMs.
I've been experimenting with Bonsai 8B — PrismML's true 1-bit model (every weight is literally 0 or 1, not ternary like BitNet). I realized that since weights are bits, the diff between two model behaviors is just a XOR mask. So I built a tool that searches for sparse XOR patches that modify model behavior. The basic idea: flip a row of weights, check if the model got better at the target task without breaking anything else, keep or revert. The set of accepted flips is the patch. What it does on held-out prompts the search never saw: Without patch: d/dx [x^7 + x] = 0 ✗ With patch: d/dx [x^7 + x] = 7x^6 + 1 ✓ Without patch: Is 113 prime? No, 113 is not prime ✗ With patch: Is 113 prime? Yes, 113 is a prime number ✓ 93 row flips. 0.007% of weights. ~1 KB. Zero inference overhead — the patched

In the Presence of the Minister of Energy, Cisco and King Abdullah University of Science and Technology (KAUST) launch landmark AI Institute to accelerate AI research, development, and talent in Saudi Arabia - Cisco Newsroom
In the Presence of the Minister of Energy, Cisco and King Abdullah University of Science and Technology (KAUST) launch landmark AI Institute to accelerate AI research, development, and talent in Saudi Arabia Cisco Newsroom
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Quantum computers might crack today's encryption far sooner than we thought
According to a study by engineers at Caltech and the UC Department of Physics, quantum computers do not need to be nearly as powerful as previously believed to crack the most advanced cryptographic technologies. The research claims that Shor's algorithm could break RSA public-key encryption using quantum computers with just... Read Entire Article


.jpg)

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!