Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessb8646llama.cpp ReleasesIran claims it has hit Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tootomshardware.comThe prompt as a genre: instructional rhetoric for language modelsGenerative AII spent a year burning money on AI and finally decided to do something about itGenerative AIThe largest programming community on Reddit just banned all content related to AI LLMs — r/programming is prioritizing only high-quality discussions about AItomshardware.comEveryone Is Worshipping the Wrong AI Heroes—What Hidden Figures Teaches Us About This MomentGenerative AIAI Pair Programming Made Us Faster — But Worse EngineersGenerative AIWhy We Need to Stop Obsessing Over AI ModelsGenerative AIThe AI Professional Development Loop — and What It Devalues for TeachersGenerative AIBeyond Autoregression: How Diffusion Language Models Are Rewriting the Rules of AIGenerative AIMicrosoft deepens its commitment to Japan with $10 billion investment in AI infrastructure, cybersecurity, and workforce - Microsoft SourceGNews AI cybersecurityAI and humanoids have no place in West Virginia’s schools - West Virginia WatchGNews AI educationBlack Hat USADark ReadingBlack Hat AsiaAI Businessb8646llama.cpp ReleasesIran claims it has hit Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tootomshardware.comThe prompt as a genre: instructional rhetoric for language modelsGenerative AII spent a year burning money on AI and finally decided to do something about itGenerative AIThe largest programming community on Reddit just banned all content related to AI LLMs — r/programming is prioritizing only high-quality discussions about AItomshardware.comEveryone Is Worshipping the Wrong AI Heroes—What Hidden Figures Teaches Us About This MomentGenerative AIAI Pair Programming Made Us Faster — But Worse EngineersGenerative AIWhy We Need to Stop Obsessing Over AI ModelsGenerative AIThe AI Professional Development Loop — and What It Devalues for TeachersGenerative AIBeyond Autoregression: How Diffusion Language Models Are Rewriting the Rules of AIGenerative AIMicrosoft deepens its commitment to Japan with $10 billion investment in AI infrastructure, cybersecurity, and workforce - Microsoft SourceGNews AI cybersecurityAI and humanoids have no place in West Virginia’s schools - West Virginia WatchGNews AI education
AI NEWS HUBbyEIGENVECTOREigenvector

MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation

arXivby [Submitted on 18 Feb 2026 (v1), last revised 30 Mar 2026 (this version, v4)]March 31, 20262 min read1 views
Source Quiz

arXiv:2602.16898v4 Announce Type: replace-cross Abstract: Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a rob — Mehrshad Taji, Arad Mahdinezhad Kashani, Iman Ahmadi, AmirHossein Jadidi, Saina Kashani, Babak Khalaj

View PDF HTML (experimental)

Abstract:Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step. Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning. Experiments in simulation and real-world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks. Code available at this https URL .

Subjects:

Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cite as: arXiv:2602.16898 [cs.RO]

(or arXiv:2602.16898v4 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2602.16898

arXiv-issued DOI via DataCite

Submission history

From: Mehrshad Taji [view email] [v1] Wed, 18 Feb 2026 21:28:56 UTC (18,739 KB) [v2] Fri, 20 Feb 2026 10:41:16 UTC (18,739 KB) [v3] Wed, 25 Feb 2026 11:49:07 UTC (18,739 KB) [v4] Mon, 30 Mar 2026 12:50:11 UTC (18,739 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MALLVI: A M…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 143 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers