Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHow to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn WorkflowsMarkTechPostComparing Today's Multi-Model DatabasesDEV CommunityBuilding a WeChat Mini Program Pre-Sale System from Scratch: A Builder's LogDEV CommunityOpenAI sees a new round of executive shake-upsBusiness Insider26 Quizzes: What We've Learned About Which Results People Actually ShareDEV CommunityLayered Agentic Retrieval for Retail Floor Questions: A Solo PoCDEV CommunityHow to Handle Sensitive Data Securely in TerraformDEV CommunitySecure Cross-Platform File Sharing: A Unified Solution for Diverse Devices and NetworksDEV CommunityHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIShow HN: Travel Hacking Toolkit – Points search and trip planning with AIHacker NewsAnthropic says Claude subscriptions will no longer support OpenClaw because it puts an 'outsized strain' on systemsBusiness InsiderI Tested a Real AI Agent for Security. The LLM Knew It Was Dangerous — But the Tool Layer Executed Anyway.DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHow to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn WorkflowsMarkTechPostComparing Today's Multi-Model DatabasesDEV CommunityBuilding a WeChat Mini Program Pre-Sale System from Scratch: A Builder's LogDEV CommunityOpenAI sees a new round of executive shake-upsBusiness Insider26 Quizzes: What We've Learned About Which Results People Actually ShareDEV CommunityLayered Agentic Retrieval for Retail Floor Questions: A Solo PoCDEV CommunityHow to Handle Sensitive Data Securely in TerraformDEV CommunitySecure Cross-Platform File Sharing: A Unified Solution for Diverse Devices and NetworksDEV CommunityHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIShow HN: Travel Hacking Toolkit – Points search and trip planning with AIHacker NewsAnthropic says Claude subscriptions will no longer support OpenClaw because it puts an 'outsized strain' on systemsBusiness InsiderI Tested a Real AI Agent for Security. The LLM Knew It Was Dangerous — But the Tool Layer Executed Anyway.DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

arXivMarch 26, 202610 min read0 views
Source Quiz

This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically — Jie Zhu, Yimin Tian, Boyang Li

Authors:Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang

View PDF HTML (experimental)

Abstract:This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.

Comments: Accepted by ICASSP 2026

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2603.24943 [cs.AI]

(or arXiv:2603.24943v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.24943

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jie Zhu [view email] [v1] Thu, 26 Mar 2026 02:20:04 UTC (1,098 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FinMCP-Benc…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 175 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers