Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechQuantum-Powered Crypto Mining Is Here—But It Won't Help You Mine BitcoinDecrypt AIRio Receipt Protocol – Cryptographic Proof for AI ActionsHacker News AI TopNew AI testing method flags fairness risks in autonomous systemsTechXplore AI[D] Make. Big. Batch. Size.Reddit r/MachineLearningNew open source AI self driving testingHacker News AI TopAgentic AI and the next intelligence explosionHacker News AI TopThe miracle of AI agent-assisted learningHacker News AI TopA Differentiable Programming System to Bridge Machine Learning and ScientificComputingDev.to AIGoogle launches Gemma 4, its "most intelligent" open model family, purpose-built for advanced reasoning and agentic workflows, under an Apache 2.0 license (The Keyword)TechmemeWhy Your AI Copilot Builds the Wrong Thing (And How to Fix It)Dev.to AIThe productivity paradox of AI coding assistantsHacker News AI TopBlack Hat USADark ReadingBlack Hat AsiaAI BusinessThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechQuantum-Powered Crypto Mining Is Here—But It Won't Help You Mine BitcoinDecrypt AIRio Receipt Protocol – Cryptographic Proof for AI ActionsHacker News AI TopNew AI testing method flags fairness risks in autonomous systemsTechXplore AI[D] Make. Big. Batch. Size.Reddit r/MachineLearningNew open source AI self driving testingHacker News AI TopAgentic AI and the next intelligence explosionHacker News AI TopThe miracle of AI agent-assisted learningHacker News AI TopA Differentiable Programming System to Bridge Machine Learning and ScientificComputingDev.to AIGoogle launches Gemma 4, its "most intelligent" open model family, purpose-built for advanced reasoning and agentic workflows, under an Apache 2.0 license (The Keyword)TechmemeWhy Your AI Copilot Builds the Wrong Thing (And How to Fix It)Dev.to AIThe productivity paradox of AI coding assistantsHacker News AI Top
AI NEWS HUBbyEIGENVECTOREigenvector

FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.26908v1 Announce Type: new Abstract: Model fusion is a key strategy for robust recognition in unconstrained scenarios, as different models provide complementary strengths. This is especially important for whole-body human recognition, where biometric cues such as face, gait, and body shape vary across samples and are typically integrated via score-fusion. However, existing score-fusion strategies are usually static, invoking all models for every test sample regardless of sample quality or modality reliability. To overcome these limitations, we propose \textbf{FusionAgent}, a novel a — Jie Zhu, Xiao Guo, Yiyang Su, Anil Jain, Xiaoming Liu

View PDF HTML (experimental)

Abstract:Model fusion is a key strategy for robust recognition in unconstrained scenarios, as different models provide complementary strengths. This is especially important for whole-body human recognition, where biometric cues such as face, gait, and body shape vary across samples and are typically integrated via score-fusion. However, existing score-fusion strategies are usually static, invoking all models for every test sample regardless of sample quality or modality reliability. To overcome these limitations, we propose \textbf{FusionAgent}, a novel agentic framework that leverages a Multimodal Large Language Model (MLLM) to perform dynamic, sample-specific model selection. Each expert model is treated as a tool, and through Reinforcement Fine-Tuning (RFT) with a metric-based reward, the agent learns to adaptively determine the optimal model combination for each test input. To address the model score misalignment and embedding heterogeneity, we introduce Anchor-based Confidence Top-k (ACT) score-fusion, which anchors on the most confident model and integrates complementary predictions in a confidence-aware manner. Extensive experiments on multiple whole-body biometric benchmarks demonstrate that FusionAgent significantly outperforms SoTA methods while achieving higher efficiency through fewer model invocations, underscoring the critical role of dynamic, explainable, and robust model fusion in real-world recognition systems. Project page: \href{this https URL}{FusionAgent}.

Comments: CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26908 [cs.CV]

(or arXiv:2603.26908v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26908

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jie Zhu [view email] [v1] Fri, 27 Mar 2026 18:35:44 UTC (1,300 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FusionAgent…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 173 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!