The Evolution of Tool Use in LLM Agents: From Single-Tool Call to Multi-Tool Orchestration
Hey there, superstar! 🎉 Imagine you have a super-smart robot friend, like a toy robot!
At first, this robot could only use one special tool at a time. Like, if it needed to draw, it picked up just a crayon. 🖍️
But now, our robot friend is getting much, much smarter! It can use many tools, one after another, to do bigger jobs. Like, first it picks up a crayon, then a ruler, then some scissors, all to build a super cool paper castle! 🏰
This news is about how our robot friends are learning to use lots of tools together, like a whole toolbox, to be even more helpful and amazing! It's like they're becoming master builders! 🛠️✨
arXiv:2603.22862v2 Announce Type: replace Abstract: Tool use enables large language models (LLMs) to access external information, invoke software systems, and act in digital environments beyond what can be solved from model parameters alone. Early research mainly studied whether a model could select and execute a correct single tool call. As agent systems evolve, however, the central problem has shifted from isolated invocation to multi-tool orchestration over long trajectories with intermediate state, execution feedback, changing environments, and practical constraints such as safety, cost, and verifiability. We comprehensively review recent progress in multi-tool LLM agents and analyzes the state of the art in this rapidly developing area. First, we unify task formulations and distinguis
Authors:Haoyuan Xu, Chang Li, Xinyan Ma, Xianhao Ou, Zihan Zhang, Tao He, Xiangyu Liu, Zixiang Wang, Jiafeng Liang, Zheng Chu, Runxuan Liu, Rongchuan Mu, Dandan Tu, Ming Liu, Bing Qin
View PDF HTML (experimental)
Abstract:Tool use enables large language models (LLMs) to access external information, invoke software systems, and act in digital environments beyond what can be solved from model parameters alone. Early research mainly studied whether a model could select and execute a correct single tool call. As agent systems evolve, however, the central problem has shifted from isolated invocation to multi-tool orchestration over long trajectories with intermediate state, execution feedback, changing environments, and practical constraints such as safety, cost, and verifiability. We comprehensively review recent progress in multi-tool LLM agents and analyzes the state of the art in this rapidly developing area. First, we unify task formulations and distinguish single-call tool use from long-horizon orchestration. Then, we organize the literature around six core dimensions: inference-time planning and execution, training and trajectory construction, safety and control, efficiency under resource constraints, capability completeness in open environments, and benchmark design and evaluation. We further summarize representative applications in software engineering, enterprise workflows, graphical user interfaces, and mobile systems. Finally, we discuss major challenges and outline future directions for building reliable, scalable, and verifiable multi-tool agents.
Subjects:
Software Engineering (cs.SE); Computation and Language (cs.CL)
Cite as: arXiv:2603.22862 [cs.SE]
(or arXiv:2603.22862v2 [cs.SE] for this version)
https://doi.org/10.48550/arXiv.2603.22862
arXiv-issued DOI via DataCite
Submission history
From: Haoyuan Xu [view email] [v1] Tue, 24 Mar 2026 07:05:05 UTC (1,024 KB) [v2] Thu, 2 Apr 2026 02:54:00 UTC (1,002 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelbenchmark
Alibaba s Qwen team makes AI models think deeper with new algorithm
Reinforcement learning hits a wall with reasoning models because every token gets the same reward. A new algorithm from Alibaba's Qwen team fixes this by weighting each step based on how much it shapes what comes next, doubling the length of thought processes in the process. The article Alibaba s Qwen team makes AI models think deeper with new algorithm appeared first on The Decoder .

Microsoft's Copilot Naming Chaos: How Many Are There?
Microsoft's Copilot Naming Chaos: How Many Are There? Meta Description: Confused about how many products Microsoft has named 'Copilot'? We break down every Copilot product, what each does, and which ones actually matter for you. TL;DR: As of April 2026, Microsoft has branded at least 10–12 distinct products and features under the "Copilot" name across its ecosystem — from Windows to Azure to security tools. The naming is genuinely confusing, and this article maps out every single one so you can figure out which Copilot you actually need (and which ones overlap). Key Takeaways Microsoft has aggressively expanded the Copilot brand since 2023, applying it to products across consumer, enterprise, developer, and security categories Many "Copilots" are actually the same underlying AI (powered by
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Alibaba s Qwen team makes AI models think deeper with new algorithm
Reinforcement learning hits a wall with reasoning models because every token gets the same reward. A new algorithm from Alibaba's Qwen team fixes this by weighting each step based on how much it shapes what comes next, doubling the length of thought processes in the process. The article Alibaba s Qwen team makes AI models think deeper with new algorithm appeared first on The Decoder .

Я создал AI бота за выходные и сэкономил 40 часов в месяц
23:47, пятница, 14 марта. Я только что закрыл invoice на $340 за копипаст данных из 23 писем в таблицу. 6 часов 20 минут по таймеру в Toggl. Это был момент, когда я осознал: я трачу 25+ часов в месяц на работу, от которой меня тошнит и которая не приносит дополнительного дохода. К концу выходных у меня был бот, который делал это за 4 минуты . Почему 25 часов копипаста уничтожают меня Такое чувство, что я застрял в рутине. Каждый раз, когда открываю Gmail и вижу 23 новых письма от клиента, мой мозг начинает страдать. Зачем я трачу 6 часов каждую неделю на копипаст? Это работа, от которой я не получаю ни удовольствия, ни денег. Проблема в том, что я даже не задумывался об автоматизации. Я просто привык к этой рутине и принимал её как данность. Но когда увидел цифру в Toggl, то понял, как мно


The AI Stack: A Practical Guide to Building Your Own Intelligent Applications
From Hype to Hands-On: Building Your Own AI Stack Every day, another headline announces how AI is revolutionizing some industry. The hype is deafening, but behind the sensational stories lies a fundamental shift in how we build software. The truth is, you don't need to be a PhD researcher at OpenAI to start building intelligent applications. The modern AI stack—the collection of tools, models, and platforms—has become remarkably accessible. This guide cuts through the noise. We'll walk through the practical layers of the AI stack, from foundational models to deployment, complete with code examples you can run today. By the end, you'll have a clear blueprint for integrating AI into your own projects. The Four Layers of the Modern AI Stack Think of building an AI-powered application like con



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!